Vector

ArcadeDB includes a native vector search engine for similarity-based retrieval of embeddings. Vector indexes are fully integrated into the SQL query engine and support ACID transactions, persistent storage, and automatic compaction.

How Vector Search Works

Vector search finds the nearest neighbors to a query vector in high-dimensional space. Instead of exact matching (like SQL WHERE), it finds the most similar items based on a distance or similarity metric.

Typical workflow:

Generate embeddings from your data using an external model (OpenAI, Sentence Transformers, etc.)
Store embeddings as vector properties on vertices or documents
Create a vector index on the property
Query with vectorNeighbors() to find the k most similar items

LSMVectorIndex Architecture

ArcadeDB’s vector index is built on two foundations:

LSM Tree storage — ArcadeDB’s proven LSM Tree architecture provides persistent, crash-safe storage with automatic compaction
JVector 4.0.0 — A high-performance vector search library that implements both HNSW (Hierarchical Navigable Small World) and Vamana (DiskANN) graph algorithms

The index stores vectors as a navigable graph where each node connects to its approximate nearest neighbors. Searches traverse this graph, narrowing in on the closest matches efficiently — typically in O(log n) time rather than O(n) brute-force scanning.

Flat vs Hierarchical Structure

The index supports two graph structures:

	Flat (default)	Hierarchical
Algorithm	Single-layer Vamana graph	Multi-layer HNSW with exponential decay
Build speed	Faster	10-20% slower
Disk usage	Baseline	5-15% larger
Best for	< 100K vectors, well-clustered data	100K+ vectors, 1024+ dimensions, diverse queries

Flat (default)

Hierarchical

Algorithm

Single-layer Vamana graph

Multi-layer HNSW with exponential decay

Build speed

Faster

10-20% slower

Disk usage

Baseline

5-15% larger

Best for

< 100K vectors, well-clustered data

100K+ vectors, 1024+ dimensions, diverse queries

Enable hierarchical mode with addHierarchy: true in the index metadata.

Similarity Functions

Three distance metrics are available:

Function	When to Use	Value Range
COSINE (default)	Text embeddings (BERT, GPT, Sentence Transformers). Direction matters, magnitude does not.	-1 to 1
DOT_PRODUCT	Normalized vectors where speed matters. 10-15% faster than COSINE.	Unbounded
EUCLIDEAN	Spatial data, point clouds, continuous measurements. Absolute distance matters.	0 to infinity

Function

When to Use

Value Range

COSINE (default)

Text embeddings (BERT, GPT, Sentence Transformers). Direction matters, magnitude does not.

-1 to 1

DOT_PRODUCT

Normalized vectors where speed matters. 10-15% faster than COSINE.

Unbounded

EUCLIDEAN

Spatial data, point clouds, continuous measurements. Absolute distance matters.

0 to infinity

If your embeddings are already L2-normalized (unit vectors), use DOT_PRODUCT for best performance — it produces the same ranking as COSINE but skips the normalization step.

Quantization

Quantization reduces memory usage by compressing vector components at the cost of slight accuracy loss:

Type	Memory Reduction	Speed	Recall	Use Case
NONE	Baseline	Baseline	100%	Small datasets (< 10K vectors), maximum accuracy
INT8 (recommended)	4x (75% savings)	10-15% faster	95-98%	Best balance of speed and accuracy for most workloads
BINARY	32x (97% savings)	15-20% faster	85-92%	Massive datasets, approximate search with reranking
PRODUCT	16-64x	Approximate	Varies	Very large datasets (100K+), enables zero-disk-I/O graph construction

Type

Memory Reduction

Speed

Recall

Use Case

NONE

Baseline

100%

Small datasets (< 10K vectors), maximum accuracy

INT8 (recommended)

4x (75% savings)

10-15% faster

95-98%

Best balance of speed and accuracy for most workloads

BINARY

32x (97% savings)

15-20% faster

85-92%

Massive datasets, approximate search with reranking

PRODUCT

16-64x

Approximate

Varies

Very large datasets (100K+), enables zero-disk-I/O graph construction

Use INT8 quantization for most use cases. It provides 4x memory savings with minimal accuracy loss and significantly faster search. Only use NONE for very small datasets where maximum precision matters.

Why INT8 is faster

Quantization doesn’t just save memory — it fundamentally changes how vectors are read during search.

Without quantization (NONE), each node visited during graph traversal requires a full document lookup: read the record from disk, deserialize it, and extract the vector property. With INT8, vectors are stored in compact contiguous index pages and read directly — no document deserialization needed.

In benchmarks with 500K 384-dimensional vectors (matching the all-MiniLM-L6-v2 embedding model), INT8 reduces search latency by 2.5x compared to NONE:

Quantization	Mean latency	p95 latency	Vector fetch path
NONE	3.50 ms	4.36 ms	Document lookup (random I/O)
INT8	1.59 ms	1.94 ms	Index pages (sequential I/O)

Quantization

Mean latency

p95 latency

Vector fetch path

NONE

3.50 ms

4.36 ms

Document lookup (random I/O)

INT8

1.59 ms

1.94 ms

Index pages (sequential I/O)

The difference becomes even more significant under memory pressure. With NONE quantization, vector data is 4x larger, evicting more data from memory caches and forcing real disk I/O. INT8 keeps the working set small enough to stay in memory even with constrained resources.

Quantization is transparent — queries work identically regardless of quantization setting. The index automatically quantizes on insert and dequantizes on retrieval.

When using PRODUCT quantization, the graph build uses Product Quantization scores instead of exact vector distances. PQ codes are compact and stay in memory, eliminating disk I/O during graph construction. This is most effective on large datasets (100K+ vectors) where PQ quality is sufficient.

Table 1. Memory and performance example: 100K vectors with 384 dimensions
Quantization	Vector memory	Search speed
NONE	156 MB	Baseline
INT8	39 MB	~2.5x faster
BINARY	5 MB	~3x faster (lower recall)

Vector Encoding (Pre-Quantized Ingest)

Available since ArcadeDB v26.5.1.

The encoding index option controls the wire and document-storage representation of vectors. It is distinct from quantization, which controls the index-internal compression scheme:

Knob What it changes Trade-off

Knob	What it changes	Trade-off
`encoding`	What the document column stores and what HTTP clients send	Wire payload, bucket bytes, client round trip
`quantization`	What the HNSW graph compresses internally	Index memory footprint, recall, search latency

encoding

What the document column stores and what HTTP clients send

Wire payload, bucket bytes, client round trip

quantization

What the HNSW graph compresses internally

Index memory footprint, recall, search latency

Two encodings are supported:

FLOAT32 (default) — the embedding property is ARRAY_OF_FLOATS (4 bytes per dimension). Backwards-compatible behaviour for every index created prior to v26.5.1.
INT8 — the embedding property is BINARY (one signed byte per dimension). Callers using providers that emit int8 directly (Cohere int8 endpoints, OpenAI text-embedding-3-large reduced precision, Sentence Transformers with int8 quantization) skip a precision-losing client-side int8 → float32 → server round trip. The HTTP payload and document bucket storage shrink 4x.

The HNSW graph still runs on float32 internally; the engine dequantizes on the read path using value / 127.0f (Cohere/OpenAI calibration). Native int8 HNSW is tracked upstream at datastax/jvector#665 — once the JVector contract widens, no schema change will be needed.

-- INT8-encoded ingest. Property MUST be BINARY.
CREATE PROPERTY Doc.embedding BINARY;
CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {
  "dimensions": 1024,
  "similarity": "COSINE",
  "encoding": "INT8"
};

The factory rejects two misconfigurations at index-creation time so the failure surfaces immediately rather than as silent mis-scaling later:

encoding=INT8 with a non-BINARY property (or encoding=FLOAT32 with a BINARY property) — the property type and encoding must agree.
encoding=INT8 combined with quantization=INT8 — redundant: the property is already byte-quantized at the wire level, so the index-internal scalar quantizer would re-quantize the dequantized floats. Pick one (encoding=INT8 for payload/storage savings, quantization=INT8 for index-internal compression), not both.

When a Cohere-style int8 source is not the origin of the bytes, note that Java’s byte range [-128, 127] includes a value the Cohere/OpenAI calibration never emits. The dequantizer clamps -128 up to -127 to keep the result inside [-1, 1] for COSINE; a one-time WARNING is logged the first time a -128 byte is encountered in a process so operators can investigate non-conforming sources.

Store Embeddings in an `EXTERNAL` Property

Available since ArcadeDB v26.5.1.

A vector embedding is typically the largest field on a record — a FLOAT32 1,536-dim embedding alone weighs 6 KB, an order of magnitude bigger than the topology (id, label, edge lists) it sits next to. Stored inline, every traversal, scan, or projection that does not touch the embedding still pages those 6 KB into the buffer cache. On a graph of a few million vertices that is enough to evict the useful topology working set long before the dataset is exhausted.

Declare the embedding property as EXTERNAL true so the bytes move to a paired external bucket. The main record then carries only an 8-byte pointer; the embedding is loaded lazily, on the queries that actually need it (vector search, similarity scoring, hybrid retrieval). See External property storage for the full description, including the per-bucket pairing, compression, and tiered-storage options.

-- FLOAT32 embedding stored in a paired external bucket.
CREATE PROPERTY Doc.embedding ARRAY_OF_FLOATS (EXTERNAL true);
CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {
  "dimensions": 1536,
  "similarity": "COSINE"
};

-- Same idea with INT8 wire/storage encoding -- still external, 4x smaller payload.
CREATE PROPERTY Doc.embedding BINARY (EXTERNAL true);
CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {
  "dimensions": 1024,
  "similarity": "COSINE",
  "encoding": "INT8"
};

When you should opt in:

The embedding is large compared to the rest of the record (any modern dense embedding qualifies).
Most of your workload is not vector search — graph traversals, filters by other properties, or full-record reads that project everything except the embedding all benefit immediately.
You plan to put the heavy payload on a different volume (e.g. embeddings on bulk storage, topology on NVMe) — set arcadedb.externalPropertyBucketPath before creating the property; see Tiered storage.

When inline storage is still the right default:

Records are tiny and the embedding is your hottest field (every query reads it). The extra pointer lookup is a per-read cost that adds up.
You are running a vector-only workload where every query projects embedding. There is nothing to save.

To migrate an existing populated type to external storage in one step, use ALTER PROPERTY … EXTERNAL true followed by REBUILD TYPE; see Migrating existing records.

Key Parameters

Parameter Default Purpose

Parameter	Default	Purpose
`dimensions`	(required)	Must match your embedding model output size
`similarity`	`COSINE`	Distance metric: COSINE, DOT_PRODUCT, or EUCLIDEAN
`encoding`	`FLOAT32`	Wire / document-storage encoding: FLOAT32 (`ARRAY_OF_FLOATS` property) or INT8 (`BINARY` property, byte per dim, 4x smaller payload). See Vector Encoding.
`quantization`	`NONE`	Index-internal compression: NONE, INT8, BINARY, or PRODUCT. INT8 is recommended for most use cases.
`efSearch`	adaptive	Search beam width at query time. Controls recall vs speed trade-off. See efSearch and Adaptive Search.
`maxConnections`	16	Connections per node. Higher = better recall, more memory
`beamWidth`	100	Search depth during build. Higher = better index quality, slower builds
`addHierarchy`	false	Enable multi-layer HNSW for large/complex datasets
`storeVectorsInGraph`	false	Co-locate vectors in graph file for faster retrieval at large scale
`buildGraphNow`	true	Build the HNSW graph immediately at index creation time. Set to `false` for deferred (lazy) building on first search.

dimensions

(required)

Must match your embedding model output size

similarity

COSINE

Distance metric: COSINE, DOT_PRODUCT, or EUCLIDEAN

encoding

FLOAT32

Wire / document-storage encoding: FLOAT32 (ARRAY_OF_FLOATS property) or INT8 (BINARY property, byte per dim, 4x smaller payload). See Vector Encoding.

quantization

NONE

Index-internal compression: NONE, INT8, BINARY, or PRODUCT. INT8 is recommended for most use cases.

efSearch

adaptive

Search beam width at query time. Controls recall vs speed trade-off. See efSearch and Adaptive Search.

maxConnections

Connections per node. Higher = better recall, more memory

beamWidth

100

Search depth during build. Higher = better index quality, slower builds

addHierarchy

false

Enable multi-layer HNSW for large/complex datasets

storeVectorsInGraph

false

Co-locate vectors in graph file for faster retrieval at large scale

buildGraphNow

true

Build the HNSW graph immediately at index creation time. Set to false for deferred (lazy) building on first search.

efSearch and Adaptive Search

The efSearch parameter controls how many candidate nodes the search explores in the vector graph. Higher values find more accurate results but take longer.

When efSearch is not explicitly set (either on the index or per-query), ArcadeDB uses an adaptive two-pass strategy:

First pass — Uses a moderate beam width (2 × k), which is sufficient for most queries on well-clustered data.
Second pass — If the first pass returns insufficient results, the search automatically widens the beam to 10 × k.

For small indexes (< 10K vectors), the full default efSearch is always used since the cost is negligible.

This adaptive behavior gives you fast queries on easy lookups while still maintaining recall on harder queries — without requiring any tuning.

Setting efSearch

You can set efSearch at three levels:

Per-query (highest priority) — pass as the 4th argument to vectorNeighbors(), either positionally or via the named options map:

-- Higher efSearch for a critical query that needs maximum recall (positional)
SELECT expand(vectorNeighbors('Doc[embedding]', [...], 10, 500))

-- Lower efSearch for a latency-sensitive query (positional)
SELECT expand(vectorNeighbors('Doc[embedding]', [...], 10, 30))

-- Options map form (extensible; also supports `filter`)
SELECT expand(vectorNeighbors('Doc[embedding]', [...], 10, { efSearch: 500 }))

Per-index — set in the index metadata at creation time:

CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {
  dimensions: 1024,
  similarity: 'COSINE',
  efSearch: 200
}

Adaptive (default) — when neither per-query nor per-index efSearch is specified, the adaptive strategy described above is used.

For most workloads, the adaptive default works well. Only set efSearch explicitly if you need consistently high recall regardless of query difficulty, or if you have strict latency requirements.

Filtered Search

Vector search can be combined with a logical filter on the same type by passing a filter option containing the allowed RIDs. The HNSW traversal restricts itself to that set, so non-matching vectors are skipped without decoding.

-- Find the 10 most similar documents within a specific tenant and category
SELECT vectorNeighbors(
         'Document[embedding]',
         :queryVector,
         10,
         { filter: (SELECT @rid FROM Document WHERE tenantId = 'acme' AND category = 'finance') }
       )

The filter value accepts a list of RIDs, RID strings, or any Identifiable. It can be produced by a subquery, a query parameter, or built programmatically.

Very selective filters (only a tiny fraction of records match) can starve the HNSW beam; combine filter with a higher efSearch to preserve recall.

Partition-aware filterable HNSW

When the type uses partitioned(<key>) as its bucket selection strategy and the query’s WHERE clause binds the partition key to a literal, the planner narrows vector.neighbors / vector.sparseNeighbors to a single per-bucket HNSW graph automatically — no filter option needed:

SELECT vector.neighbors('Document[embedding]', :queryVector, 10) AS neighbors
  FROM Document
 WHERE tenant_id = 'acme'

With the partition strategy in place, this query touches only tenant_id = 'acme'’s HNSW graph instead of fanning out across every bucket. The optimisation kicks in transparently for both dense (`vector.neighbors) and sparse (vector.sparseNeighbors) search and composes with explicit filter lists when both are present.

The two approaches are complementary:

partitioned(<key>) + a WHERE predicate on the key prunes the bucket set before the HNSW traversal even starts. Best for high-cardinality scope keys (tenant, customer, region) where every query naturally carries the predicate.
filter: (SELECT @rid …) prunes individual RIDs during the HNSW traversal. Best for low-cardinality or query-shaped filters that don’t map cleanly onto a partition key.

For the multi-tenant SaaS case, partition-aware HNSW is usually the right primitive. See Schema design 101: choosing a bucket strategy for picking the partition key.

A single vertex type can have multiple vector indexes on different properties:

CREATE INDEX ON Product (imageEmbedding) LSM_VECTOR METADATA {dimensions: 512, similarity: 'COSINE'}
CREATE INDEX ON Product (textEmbedding)  LSM_VECTOR METADATA {dimensions: 768, similarity: 'COSINE'}

Query each index independently to search by image similarity, text similarity, or combine scores.

Integration with Other Models

Vector search combines naturally with ArcadeDB’s other data models:

Graph + Vectors — Find similar items, then traverse relationships to discover connected context (Graph RAG pattern)
Full-text + Vectors — Hybrid search combining keyword matching with semantic similarity (Knowledge Graph pattern)
Time Series + Vectors — Detect behavioral anomalies by comparing embedding patterns over time (Fraud Detection pattern)

SQL Example

Create a vector index and query it:

-- Create vertex type and property
CREATE VERTEX TYPE Document;
CREATE PROPERTY Document.content STRING;
CREATE PROPERTY Document.embedding ARRAY_OF_FLOATS;

-- Create vector index with 384 dimensions using COSINE similarity
CREATE INDEX ON Document (embedding) LSM_VECTOR METADATA {
  dimensions: 384,
  similarity: 'COSINE'
};

-- Query for the 10 nearest documents
-- Returns rows with .record (full document) and .distance (0 = identical for COSINE)
SELECT expand(vectorNeighbors('Document[embedding]', $queryVector, 10))

Java Example

Create and query a vector index programmatically:

import com.arcadedb.index.lsm.LSMVectorIndex;
import com.arcadedb.index.lsm.LSMVectorIndexBuilder;
import com.arcadedb.index.vector.VectorSimilarityFunction;

// Create index programmatically
final LSMVectorIndexBuilder builder = new LSMVectorIndexBuilder(
    database,
    "Document",
    new String[]{"embedding"})
    .withDimensions(384)
    .withSimilarity(VectorSimilarityFunction.COSINE)
    .withMaxConnections(16)
    .withBeamWidth(100);

final LSMVectorIndex index = builder.create();

// Query the index using SQL
final ResultSet resultSet = database.query("sql",
    "SELECT expand(vectorNeighbors('Document[embedding]', ?, 10))",
    queryVector);

Configuration Parameters

When creating LSMVectorIndex instances, the following parameters can be configured:

dimensions: The dimensionality of the vectors (must match your embedding model output)
similarity: The distance function for similarity calculation (COSINE, DOT_PRODUCT, EUCLIDEAN, etc.)
maxConnections: Maximum number of connections per layer in the HNSW graph (default: 16, increase for better recall)
beamWidth: Beam width for approximate nearest neighbor search (default: 100, increase for more accurate results)

Supported Similarity Functions

Measure Name Type

Measure	Name	Type
`COSINE`	Cosine Similarity	L₂
`DOT_PRODUCT`	Inner Product	L₂
`EUCLIDEAN`	Euclidean Distance	L₂

COSINE

Cosine Similarity

L₂

DOT_PRODUCT

Inner Product

L₂

EUCLIDEAN

Euclidean Distance

L₂

For more information on vector embeddings, see the Vector Embeddings section.

Sparse Vector Search

Available since ArcadeDB v26.5.1.

In addition to dense embeddings, ArcadeDB supports sparse vector retrieval via the LSM_SPARSE_VECTOR index. Sparse vectors carry only the non-zero dimensions of a high-dimensional space and are produced by learned-sparse retrieval models such as SPLADE, BGE-M3, OpenSearch’s opensearch-neural-sparse-encoding-multilingual-v1, and BM25-as-sparse-vector pipelines.

Sparse retrieval excels where dense embeddings collapse semantically distinct queries into the same point ("freeze card after theft" vs "unfreeze card after travel"), because the per-token weights expose the discriminating terms directly.

Storage layout

The index uses a posting-list inverted structure on the LSM-Tree backbone with composite key (int dim_id, RID rid, float weight). It inherits ACID transactions, WAL, replication, and compaction from the LSM-Tree, just like every other ArcadeDB index. Per-document data lives in two parallel array properties:

an ARRAY_OF_INTEGERS of non-zero dimension ids, and
an ARRAY_OF_FLOATS of the matching weights.

Both arrays must have the same length on every record. Weights must be non-negative; negative weights would corrupt the WAND per-dim upper bound used during retrieval and are rejected at write time. All standard sparse models (BM25, SPLADE, BGE-M3, Cohere sparse) emit non-negative weights, so this is not a real constraint in practice.

Schema setup

CREATE DOCUMENT TYPE Doc;
CREATE PROPERTY Doc.tokens  ARRAY_OF_INTEGERS;
CREATE PROPERTY Doc.weights ARRAY_OF_FLOATS;

-- LSM_SPARSE_VECTOR index. `dimensions` declares the vocabulary cap (0 = open-ended).
-- `modifier: 'IDF'` enables Robertson-Sparck-Jones IDF weighting at query time.
CREATE INDEX ON Doc (tokens, weights) LSM_SPARSE_VECTOR
  METADATA { "dimensions": 105000, "modifier": "IDF" };

Querying

Use vector.sparseNeighbors to retrieve top-K records by sparse dot product:

SELECT expand(`vector.sparseNeighbors`(
    'Doc[tokens,weights]',
    :queryIndices, :queryValues,
    50
))

Options map (5th argument) supports filter (allowed-RIDs whitelist), groupBy, groupSize — see Group-By Retrieval below.

Top-K algorithm

Retrieval uses document-at-a-time WAND with per-dim max_weight upper bounds: per-dim cursors traverse postings in RID order, the algorithm pivots to the smallest RID whose cumulative upper bound can still beat the current K-th best score, and skips cursors below the pivot via direct seeks. Only dot-product similarity is supported (cosine on sparse vectors is conventionally handled by L2-normalizing both the query and stored vectors at insert time).

The MVP works well up to roughly 10M sparse vectors. The scaling story past that point (BlockMax-WAND with per-page bounds, weight quantization, parallel per-segment scoring) is tracked separately.

Hybrid Search with `vector.fuse`

Available since ArcadeDB v26.5.1.

vector.fuse combines two or more ranked sub-pipelines into a single ranked top-K, server-side, in one query. It generalises the dense+sparse hybrid pattern but works with any source that yields (@rid, $score) rows: dense vector.neighbors, sparse vector.sparseNeighbors, full-text SEARCH_INDEX, or any plain SELECT … ORDER BY … LIMIT N.

SELECT expand(`vector.fuse`(
    `vector.neighbors`('Doc[dense]', :denseVec, 50),
    `vector.sparseNeighbors`('Doc[tokens,weights]', :qIdx, :qVal, 50),
    { fusion: 'RRF', groupBy: 'source_file', groupSize: 1 }
)) LIMIT 10

Fusion strategies

vector.fuse supports three strategies via the fusion option:

Strategy When to use

Strategy	When to use
`RRF` (default)	Reciprocal Rank Fusion. `score = sum_i weight_i / (k + rank_i)`. Rank-only, indifferent to score scales — best when sources have wildly different score distributions (e.g. cosine in [-1, 1] plus BM25 in [0, +inf)). Default `k = 60`.
`DBSF`	Distribution-Based Score Fusion (Qdrant 1.11+). Per-source scores normalised to [mean - 3sigma, mean + 3sigma], then weighted sum. Useful when sources produce comparable, roughly Gaussian distributions and you want score magnitude (not just rank) to influence fusion.
`LINEAR`	Per-source min-max normalisation, then weighted sum. Use when you have already-tuned weights from offline relevance experiments. Requires every source row to carry a numeric score.

RRF (default)

Reciprocal Rank Fusion. score = sum_i weight_i / (k + rank_i). Rank-only, indifferent to score scales — best when sources have wildly different score distributions (e.g. cosine in [-1, 1] plus BM25 in [0, +inf)). Default k = 60.

DBSF

Distribution-Based Score Fusion (Qdrant 1.11+). Per-source scores normalised to [mean - 3sigma, mean + 3sigma], then weighted sum. Useful when sources produce comparable, roughly Gaussian distributions and you want score magnitude (not just rank) to influence fusion.

LINEAR

Per-source min-max normalisation, then weighted sum. Use when you have already-tuned weights from offline relevance experiments. Requires every source row to carry a numeric score.

vector.fuse assumes "higher = better" for every source. It auto-flips the distance field exposed by vector.neighbors to a similarity at extract time so dense and sparse sources fuse without manual rescaling. If you build a custom source via SELECT @rid, <expr> AS score FROM …, expose score with similarity semantics for LINEAR and DBSF to behave correctly. RRF is rank-only so it tolerates either convention.

Weighted fusion

Per-source weights tilt the fused ranking:

-- Dense matters 3x more than sparse for this corpus.
SELECT expand(`vector.fuse`(
    `vector.neighbors`('Doc[dense]', :denseVec, 50),
    `vector.sparseNeighbors`('Doc[tokens,weights]', :qIdx, :qVal, 50),
    { fusion: 'RRF', weights: [3.0, 1.0] }
)) LIMIT 10

Three-way fusion (dense + sparse + full-text)

SELECT expand(`vector.fuse`(
    `vector.neighbors`('Doc[dense]', :denseVec, 100),
    `vector.sparseNeighbors`('Doc[tokens,weights]', :qIdx, :qVal, 100),
    (SELECT @rid, $score FROM Doc WHERE SEARCH_INDEX('Doc[content]', :keywords) = true),
    { fusion: 'RRF', k: 60 }
)) LIMIT 10

Outer composition

The fused output is itself a result set, so it composes with outer WHERE, ORDER BY, LIMIT, joins, and projections:

-- Score threshold + projection.
SELECT title, source_file, score
FROM (
  SELECT expand(`vector.fuse`(
      `vector.neighbors`('Doc[dense]', :denseVec, 50),
      `vector.sparseNeighbors`('Doc[tokens,weights]', :qIdx, :qVal, 50),
      { fusion: 'RRF' }
  ))
)
WHERE score >= 0.02
ORDER BY score DESC
LIMIT 10

Group-By Retrieval

Available since ArcadeDB v26.5.1.

vector.neighbors, vector.sparseNeighbors, and vector.fuse all accept groupBy and groupSize options for diversification at retrieval time, mirroring Qdrant’s query_points_groups semantics. Common pattern: best chunk per source document, deduped at the index level rather than over-fetched and post-partitioned in the application.

-- Top 10 distinct source files, best matching chunk from each.
SELECT expand(`vector.neighbors`(
    'Doc[embedding]', :queryVec, 10,
    { groupBy: 'source_file', groupSize: 1 }
))

The third positional argument (limit / k) becomes the max number of distinct groups when groupBy is set. Total returned rows are bounded by limit * groupSize. groupSize defaults to 1.

groupBy accepts dotted nested-field paths — metadata.author, provenance.source.id, etc. Each segment after the first descends one level via Map-typed properties or embedded documents; a missing segment lands the row in the null group.

groupBy composes with the existing filter option:

SELECT expand(`vector.neighbors`(
    'Doc[embedding]', :queryVec, 10,
    { filter: [#5:0, #5:1, #5:2], groupBy: 'source_file', groupSize: 1 }
))

Grouping is integrated into the index traversal proper, not applied as a post-filter on an over-fetched candidate pool:

Sparse (vector.sparseNeighbors) — the BlockMax-WAND DAAT loop runs with a per-group min-heap. A new group opens only while the heap holds fewer than limit distinct keys; within each group, a candidate replaces the group’s worst member when its score is higher. The pruning threshold tightens to the global per-group worst score once every group has reached groupSize, so the BMW pivot can skip whole posting-list regions that cannot beat any group’s current worst member. allowedRIDs is applied inline — no over-fetch.
Dense (vector.neighbors) — a group-aware Bits filter is plugged into JVector’s HNSW search. Candidates from a full group are rejected before scoring, so the graph traversal stops expanding into regions that cannot contribute new groups. The search budget is sized to limit * groupSize. Because Bits cannot consult scores (it gates eligibility before scoring), best-per-group is approximate: the first groupSize admitted candidates per group are kept, in HNSW visit order. HNSW visits approximately best-first from the entry node so the first encountered are usually among the best, but a strict best-per-group guarantee is not provided — raise efSearch if you need wider coverage.

Pathological combinations of limit and groupSize (e.g. limit=1000, groupSize=1000 would size a per-group state past the cap) are still rejected with an explicit over-fetch budget exceeded error to bound memory.

Vector

How Vector Search Works

LSMVectorIndex Architecture

Flat vs Hierarchical Structure

Similarity Functions

Quantization

Why INT8 is faster

Vector Encoding (Pre-Quantized Ingest)

Store Embeddings in an EXTERNAL Property

Key Parameters

efSearch and Adaptive Search

Setting efSearch

Filtered Search

Partition-aware filterable HNSW

Multi-Modal Search

Integration with Other Models

SQL Example

Java Example

Configuration Parameters

Supported Similarity Functions

Sparse Vector Search

Storage layout

Schema setup

Querying

Top-K algorithm

Hybrid Search with vector.fuse

Fusion strategies

Weighted fusion

Three-way fusion (dense + sparse + full-text)

Outer composition

Group-By Retrieval

Further Reading

Store Embeddings in an `EXTERNAL` Property

Hybrid Search with `vector.fuse`