Vector Functions
Vector functions provide comprehensive operations for vector embeddings, similarity search, and machine learning workflows. These functions are available in both SQL and Cypher queries.
|
Every vector function has two equivalent names: the namespaced Prefer the namespaced |
Aggregation Functions
|
There are two distinct categories of "statistics" functions, which is easy to confuse:
|
vector.sum()
Aggregate function: element-wise sum of vectors.
Syntax: vector.sum(<field>)
Returns: Vector - Sum of all vectors
SELECT vector.sum(embedding) FROM documents
vector.avg()
Aggregate function: element-wise average of vectors.
Syntax: vector.avg(<field>)
Returns: Vector - Average vector
SELECT vector.avg(embedding) FROM documents
Basic Operations
vector.dimension()
Returns the dimension of a vector (length of the underlying array).
Syntax: vector.dimension(<vector>)
Returns: Integer - Vector dimension. A NULL argument returns 0 (consistent with .size()/.length()).
SELECT vector.dimension([1.0, 2.0, 3.0])
-- Returns: 3
SELECT vector.dimension(null)
-- Returns: 0
RETURN vector.dimension([1.0, 2.0, 3.0]) AS dim
vector.add()
Returns element-wise sum of two vectors, or adds a scalar to every element (broadcasting). The scalar may be either argument since addition is commutative.
Syntax: vector.add(<vector1>, <vector2> | <scalar>)
Returns: Vector - Sum vector
SELECT vector.add([1.0, 2.0, 3.0], [2.0, 3.0, 4.0])
-- Returns: [3.0, 5.0, 7.0]
SELECT vector.add([1.0, 2.0, 3.0], 4.0)
-- Returns: [5.0, 6.0, 7.0] (scalar broadcasting)
vector.subtract()
Returns element-wise difference of two vectors, or subtracts with a scalar (broadcasting). Order is
preserved: vector - scalar subtracts the scalar from each element, while scalar - vector subtracts
each element from the scalar.
Syntax: vector.subtract(<vector1> | <scalar>, <vector2> | <scalar>)
Returns: Vector - Difference vector
SELECT vector.subtract([3.0, 5.0, 7.0], [1.0, 2.0, 3.0])
-- Returns: [2.0, 3.0, 4.0]
SELECT vector.subtract([1.0, 2.0, 3.0], 1.0)
-- Returns: [0.0, 1.0, 2.0] (vector - scalar)
SELECT vector.subtract(10.0, [1.0, 2.0, 3.0])
-- Returns: [9.0, 8.0, 7.0] (scalar - vector)
vector.scale()
Scales vector element-wise by a scalar value.
Syntax: vector.scale(<vector>, <scalar>)
Returns: Vector - Scaled vector
SELECT vector.scale([1.0, 2.0, 3.0], 2.0)
-- Returns: [2.0, 4.0, 6.0]
vector.clip()
Clips (clamps) vector elements to a specified range [min, max]. Also available as vector.clamp() /
vectorClamp().
Syntax: vector.clip(<vector>, <min>, <max>)
Returns: Vector - Clipped vector
SELECT vector.clip([1.0, 5.0, 10.0], 2.0, 8.0)
-- Returns: [2.0, 5.0, 8.0]
SELECT vector.clamp([1.0, 5.0, 10.0], 2.0, 8.0)
-- Returns: [2.0, 5.0, 8.0] (clamp is an alias of clip)
Nearest Neighbor Search
vector.neighbors()
Returns k nearest neighbors from a vector index.
Syntax:
vector.neighbors(<index-spec>, <query-vector>, <k>)
vector.neighbors(<index-spec>, <query-vector>, <k>, <efSearch>)
vector.neighbors(<index-spec>, <query-vector>, <k>, { <option>: <value>, ... })
Parameters:
-
index-spec- Index specification as'TypeName[propertyName]' -
query-vector- Query vector or key to look up -
k- Number of neighbors to return -
efSearch- (optional, positional) Search beam width. Controls the trade-off between recall and speed. Higher values improve recall but increase latency. When omitted, the index uses adaptive efSearch.
Options map (optional, alternative to the positional efSearch argument):
| Key | Type | Description |
|---|---|---|
|
Integer |
Same semantics as the positional form. |
|
List of RIDs or RID strings |
Restricts the search to the provided set of records. Useful to combine a vector search with a logical filter: first select the candidate RIDs, then pass them as |
Returns: List - Nearest neighbors with distances
-- Default (adaptive efSearch)
SELECT vector.neighbors('Document[embedding]', [0.1, 0.2, 0.3], 5)
-- Explicit efSearch for higher recall (positional form, backward compatible)
SELECT vector.neighbors('Document[embedding]', [0.1, 0.2, 0.3], 5, 500)
-- Options map form (named, extensible)
SELECT vector.neighbors('Document[embedding]', [0.1, 0.2, 0.3], 5, { efSearch: 500 })
-- Combine a vector search with a logical filter on the same type
SELECT vector.neighbors(
'Document[embedding]',
[0.1, 0.2, 0.3],
10,
{ filter: (SELECT @rid FROM Document WHERE tenantId = 'acme' AND category = 'finance') }
)
-- Using a parameter binding for the RID list
SELECT vector.neighbors('Document[embedding]', :queryVector, 10, { efSearch: 300, filter: :allowedRids })
|
The options map rejects unknown keys with a descriptive error to catch typos, for example passing |
Cypher Usage:
The recommended way to use vector search from Cypher is via CALL, which uses the HNSW index for fast approximate nearest neighbor search:
// ArcadeDB-native syntax
CALL vector.neighbors('Document[embedding]', $queryVector, 10)
YIELD name, distance
RETURN name, distance
ORDER BY distance
// Neo4j-compatible syntax (returns node + score)
CALL db.index.vector.queryNodes('Document[embedding]', 10, $queryVector)
YIELD node, score
RETURN node.title AS title, score
ORDER BY score DESC
db.index.vector.queryNodes() returns score (cosine similarity, 1.0 = identical), while vector.neighbors() returns distance (0.0 = identical). The relationship is score = 1 - distance.
|
Type-Specific Search with Inheritance:
When a vector index exists on a parent type, you can search specific child types:
-- Search only in EMBEDDING_IMAGE records
SELECT vector.neighbors('EMBEDDING_IMAGE[vector]', $queryVector, 10)
-- Search across all types (parent + children)
SELECT vector.neighbors('EMBEDDING[vector]', $queryVector, 10)
Normalization and Norms
vector.normalize()
Normalizes vector to unit length (L2 norm = 1.0).
Syntax: vector.normalize(<vector>)
Returns: Vector - Normalized vector
SELECT vector.normalize([3.0, 4.0])
-- Returns: [0.6, 0.8]
vector.isnormalized()
Checks if vector is normalized (L2 norm ≈ 1.0). The optional tolerance defaults to 0.001, which is an
appropriate loose bound for ArcadeDB’s float vectors (sqrt(float eps) ≈ 3.45e-4); pass a smaller value
to tighten the check.
Syntax: vector.isnormalized(<vector>, [tolerance])
Returns: Boolean - true if normalized
SELECT vector.isnormalized([0.6, 0.8])
-- Returns: true
vector.magnitude()
Computes L2 norm (Euclidean length) of vector. Also available as vector.l2Norm() / vectorL2Norm(),
symmetric with vector.l1Norm() and vector.lInfNorm().
Syntax: vector.magnitude(<vector>)
Returns: Double - L2 norm
SELECT vector.magnitude([3.0, 4.0])
-- Returns: 5.0
SELECT vector.l2Norm([3.0, 4.0])
-- Returns: 5.0 (l2Norm is an alias of magnitude)
Quantization
vector.quantizeint8()
Quantizes vector to 8-bit integers using min-max scaling.
Syntax: vector.quantizeint8(<vector>)
Returns: ByteArray - Quantized vector
SELECT vector.quantizeint8([0.1, 0.5, 0.9])
vector.dequantizeint8()
Dequantizes 8-bit integers back to float vector (approximate recovery). Pass the result of
vector.quantizeint8() directly (the min/max are read from it), or supply the bytes with explicit
min/max.
Syntax: vector.dequantizeint8(<result>) or vector.dequantizeint8(<quantized>, <min>, <max>)
Returns: Vector - Dequantized vector
SELECT vector.dequantizeint8(vector.quantizeint8([1.0, 2.0, 3.0]))
-- Returns: ~[1.0, 2.0, 3.0]
vector.quantizebinary()
Quantizes vector to binary (1 bit per dimension) using median threshold. The median is used (rather than a fixed value like 0) so the bits split ~50/50, which maximizes the information retained per bit for arbitrarily-centered embeddings.
Syntax: vector.quantizebinary(<vector>)
Returns: ByteArray - Binary quantized vector
SELECT vector.quantizebinary([0.1, 0.5, 0.9])
vector.dequantizebinary()
Reconstructs an approximate float vector from a binary-quantized result. Binary quantization keeps only
one bit per dimension (the sign relative to the median), so this is lossy: each bit maps to highValue
(default 1.0) when set or lowValue (default -1.0) when clear. Accepts the result of
vector.quantizebinary() directly.
Syntax: vector.dequantizebinary(<quantized> [, <lowValue>, <highValue>])
Returns: Vector - Reconstructed (sign) vector
SELECT vector.dequantizebinary(vector.quantizebinary([0.1, 0.5, 0.9]))
-- Returns: [-1.0, 1.0, 1.0]
Scoring and Fusion
vector.hybridscore()
Computes weighted average of two scores.
Syntax: vector.hybridscore(<score1>, <score2>, <alpha>)
Returns: Double - Weighted average
SELECT vector.hybridscore(0.8, 0.6, 0.7)
-- Returns: 0.7 * 0.8 + 0.3 * 0.6 = 0.74
vector.multiscore()
Combines multiple scores using a fusion method.
Syntax: vector.multiscore(<scores>, <method>, [weights])
Methods:
-
'MAX'- Maximum score (ColBERT style) -
'AVG'- Arithmetic average -
'MIN'- Minimum score -
'WEIGHTED'- Weighted average (requires weights)
Returns: Double - Combined score
SELECT vector.multiscore([0.9, 0.7, 0.8], 'MAX')
-- Returns: 0.9
vector.rrfscore()
Computes Reciprocal Rank Fusion (RRF) for combining multiple rankings.
Syntax: vector.rrfscore(<rank1>, <rank2>, …, [{ k: <long> }]) or
vector.rrfscore([<ranks>], [{ k: <long> }])
Ranks may be passed as variadic arguments or grouped in a single array/list (consistent with
vector.multiscore). Every positional numeric argument is treated as a rank. The center constant k
(default 60) is set only via the trailing options map { k: <long> } - a bare trailing number is always a
rank, never k.
Parameters:
-
k- Center rank constant (default: 60), via{ k: <long> }
Returns: Double - RRF score
SELECT vector.rrfscore(1, 2, 4)
-- Returns: 1/61 + 1/62 + 1/64
SELECT vector.rrfscore([1, 2, 4], { k: 100 })
-- Returns: 1/101 + 1/102 + 1/104
vector.normalizescores()
Normalizes scores to [0, 1] range using min-max normalization.
Syntax: vector.normalizescores(<scores>)
Returns: Vector - Normalized scores
SELECT vector.normalizescores([1.0, 2.0, 3.0])
-- Returns: [0.0, 0.5, 1.0]
vector.scoretransform()
Transforms scores using various functions.
Syntax: vector.scoretransform(<score>, <method>)
Methods:
-
'LINEAR'- No transformation -
'SIGMOID'- Logistic function, maps to (0, 1) -
'TANH'- Hyperbolic tangent, maps to (-1, 1) -
'LOG'/'LN'- Natural logarithm (requires a positive score);LNis the clearer synonym -
'EXP'- Exponential function
Returns: Double - Transformed score
SELECT vector.scoretransform(0.5, 'SIGMOID')
SELECT vector.scoretransform(0.5, 'TANH')
-- Returns: 0.4621172
SELECT vector.scoretransform(2.5, 'LN')
-- Returns: 0.9162907 (same as 'LOG')
Similarity and Distance
vector.dotproduct()
Computes dot product (inner product) between two vectors.
Syntax: vector.dotproduct(<vector1>, <vector2>)
Returns: Double - Dot product
SELECT vector.dotproduct([1.0, 2.0, 3.0], [4.0, 5.0, 6.0])
-- Returns: 32.0
vector.cosinesimilarity()
Computes cosine similarity between two vectors. Returns value between -1 and 1.
Syntax: vector.cosinesimilarity(<vector1>, <vector2>)
Returns: Double - Cosine similarity (-1 to 1)
SELECT vector.cosinesimilarity([1.0, 0.0], [1.0, 0.0])
-- Returns: 1.0 (identical direction)
SELECT vector.cosinesimilarity([1.0, 0.0], [0.0, 1.0])
-- Returns: 0.0 (orthogonal)
vector.l2distance()
Computes L2 distance (Euclidean distance) between two vectors.
Syntax: vector.l2distance(<vector1>, <vector2>)
Returns: Double - Euclidean distance
SELECT vector.l2distance([0.0, 0.0], [3.0, 4.0])
-- Returns: 5.0
vector.approxdistance()
Computes approximate distance between quantized vectors without full dequantization. When you pass the
result objects of vector.quantizeint8()/vector.quantizebinary(), the mode is inferred and the third
argument can be omitted; the explicit mode is still required for raw byte arrays.
Syntax: vector.approxdistance(<quantized1>, <quantized2> [, <mode>])
Modes:
-
'INT8'- Faster than floats; preserves the top-k ordering of the true distances (the approximate scalar distances rank in the same order, even though their absolute values differ). Accepts avector.quantizeint8()result or a raw int8 byte array. -
'BINARY'- Very fast Hamming distance, 8x fewer operations. Requires thevector.quantizebinary()result object (raw packed bits are not enough - the Hamming normalization needs the original length).
Returns: Double - Approximate distance
-- mode inferred from the quantization result objects
SELECT vector.approxdistance(
vector.quantizeint8([1.0, 2.0, 3.0]),
vector.quantizeint8([1.0, 3.0, 3.0])
)
Sparse Vectors
vector.densetosparse()
Converts dense vector to sparse representation.
Syntax: vector.densetosparse(<vector>, [threshold])
Parameters:
-
threshold- Values below this are considered zero (default: 0.0)
Returns: SparseVector - Sparse representation
SELECT vector.densetosparse([0.5, 0.0, 0.1], 0.2)
-- Only keeps elements >= 0.2
vector.sparsetodense()
Converts sparse vector back to dense representation.
Syntax: vector.sparsetodense(<sparsevector>)
Returns: Vector - Dense vector
SELECT vector.sparsetodense(vector.sparsecreate([0, 2], [0.5, 0.3]))
vector.sparsecreate()
Creates sparse vector from indices and values.
Syntax: vector.sparsecreate(<indices>, <values>, [dimension])
Returns: SparseVector - Sparse vector
SELECT vector.sparsecreate([0, 2, 5], [0.5, 0.3, 0.8], 7)
vector.sparsedot()
Computes dot product between two sparse vectors.
Syntax: vector.sparsedot(<sparse1>, <sparse2>)
Returns: Double - Dot product
SELECT vector.sparsedot(
vector.densetosparse([1.0, 2.0, 3.0]),
vector.densetosparse([1.0, 1.0, 1.0])
)
vector.sparsity()
Returns a sparsity measure of a vector. The threshold is optional and defaults to sqrt(eps) for float
(~3.45e-4). An optional third argument selects the measure.
Syntax: vector.sparsity(<vector> [, <threshold> [, <mode>]])
Modes:
-
'FRACTION'(default) - fraction of elements with|x| < threshold, in [0, 1] -
'L0'- the L0 pseudonorm: count of "significant" elements with|x| >= threshold(integer) -
'GMEAN'- geometric mean of the absolute values,exp(mean(ln|x|)); returns 0 if any element is 0
Returns: Double (FRACTION, GMEAN) or Integer (L0)
SELECT vector.sparsity([0.01, 0.1, 0.05, 0.02], 0.06)
-- Returns: 0.75 (3 of 4 elements are below 0.06)
SELECT vector.sparsity([0.0, 1.0, 2.0, 3.0])
-- Returns: 0.25 (uses the default threshold; one near-zero element)
SELECT vector.sparsity([0.0, 1.0, 2.0, 3.0], 0.5, 'L0')
-- Returns: 3 (three significant elements)
SELECT vector.sparsity([1.0, 2.0, 4.0], 0.0, 'GMEAN')
-- Returns: 2.0 (geometric mean)
Statistics
vector.variance()
Computes variance of vector elements.
Syntax: vector.variance(<vector>)
Returns: Double - Variance
SELECT vector.variance([1.0, 2.0, 3.0])
vector.stddev()
Computes standard deviation of vector elements.
Syntax: vector.stddev(<vector>)
Returns: Double - Standard deviation
SELECT vector.stddev([1.0, 2.0, 3.0])
vector.hasnan()
Checks if vector contains NaN values. A NULL element (e.g. produced by an invalid math op such as
sqrt(-1.0) that the engine coerces to NULL inside a collection) is treated as NaN.
Syntax: vector.hasnan(<vector>)
Returns: Boolean - true if contains NaN
SELECT vector.hasnan([1.0, 2.0, 3.0])
-- Returns: false
SELECT vector.hasnan([1.0, sqrt(-1.0), 3.0])
-- Returns: true (the NULL element counts as NaN)
vector.hasinf()
Checks if vector contains infinity values. A NULL element is treated as NaN, which is not infinite, so
it does not trigger a true result.
Syntax: vector.hasinf(<vector>)
Returns: Boolean - true if contains infinity
SELECT vector.hasinf([1.0, 10e400, 2.0])
-- Returns: true
vector.hasnull()
Checks if a vector contains any NULL element. Unlike vector.hasnan(), this distinguishes a genuine
NULL (a missing element) from a NaN float value. Primitive float[]/double[]/int[]/long[] inputs
can never hold a NULL and therefore always return false.
Syntax: vector.hasnull(<vector>)
Returns: Boolean - true if contains a NULL element
SELECT vector.hasnull([1.0, 2.0, 3.0])
-- Returns: false
SELECT vector.hasnull([1.0, sqrt(-1.0), 3.0])
-- Returns: true
Utility Functions
vector.tostring()
Converts vector to string representation. Equivalent to the asString() method on a vector value
(e.g. embedding.asString('PYTHON')); both share the same formatter.
Syntax: vector.tostring(<vector>, [format])
Formats:
-
'COMPACT'- Single line[1.0, 2.0, 3.0](default) -
'PRETTY'- Multi-line with formatting -
'PYTHON'- Python list format -
'MATLAB'- space-separated row vector[1.0 2.0 3.0] -
'MATLAB_COLUMN'- semicolon-separated column vector[1.0; 2.0; 3.0] -
'JULIA'- Julia vector literal[1.0, 2.0, 3.0] -
'NUMPY'- bare comma-separated1.0, 2.0, 3.0(no brackets), fornumpy.fromstring(…, sep=",")
Returns: String - Formatted vector
SELECT vector.tostring([0.5, 0.25, 0.75], 'PYTHON')
SELECT vector.tostring([0.5, 0.25, 0.75], 'NUMPY')
-- Returns: 0.5, 0.25, 0.75