Vector Functions

Vector functions provide comprehensive operations for vector embeddings, similarity search, and machine learning workflows. These functions are available in both SQL and Cypher queries.

Every vector function has two equivalent names: the namespaced vector.<name>() form (e.g. vector.dimension()) and a camelCase alias kept for backward compatibility (e.g. vectorDimension()). They resolve to the same function and are case-insensitive.

Prefer the namespaced vector.<name>() form in new queries and examples: it groups the suite consistently with the other extended namespaces (geo., math., …​) and reads more clearly. The namespaced form can be called directly, without back-ticks (e.g. SELECT vector.cosineSimilarity(a, b)). All examples in this section use it.

Aggregation Functions

There are two distinct categories of "statistics" functions, which is easy to confuse:

  • Row aggregates that return a vector - vector.sum(), vector.avg(), vector.min(), vector.max() fold a column of vectors element-wise across rows. For example vector.avg(embedding) returns the centroid (the element-wise arithmetic mean) of all the embeddings in the result set.

  • Per-vector element statistics that return a scalar - vector.variance() and vector.stddev() operate on a single vector and return the variance / standard deviation of that vector’s elements. They are intentionally not row aggregates (unlike the scalar SQL variance()/stddev()).

vector.sum()

Aggregate function: element-wise sum of vectors.

Syntax: vector.sum(<field>)

Returns: Vector - Sum of all vectors

SELECT vector.sum(embedding) FROM documents

vector.avg()

Aggregate function: element-wise average of vectors.

Syntax: vector.avg(<field>)

Returns: Vector - Average vector

SELECT vector.avg(embedding) FROM documents

vector.min()

Aggregate function: element-wise minimum of vectors.

Syntax: vector.min(<field>)

Returns: Vector - Minimum vector

SELECT vector.min(embedding) FROM documents

vector.max()

Aggregate function: element-wise maximum of vectors.

Syntax: vector.max(<field>)

Returns: Vector - Maximum vector

SELECT vector.max(embedding) FROM documents

Basic Operations

vector.dimension()

Returns the dimension of a vector (length of the underlying array).

Syntax: vector.dimension(<vector>)

Returns: Integer - Vector dimension. A NULL argument returns 0 (consistent with .size()/.length()).

SELECT vector.dimension([1.0, 2.0, 3.0])
-- Returns: 3

SELECT vector.dimension(null)
-- Returns: 0
RETURN vector.dimension([1.0, 2.0, 3.0]) AS dim

vector.add()

Returns element-wise sum of two vectors, or adds a scalar to every element (broadcasting). The scalar may be either argument since addition is commutative.

Syntax: vector.add(<vector1>, <vector2> | <scalar>)

Returns: Vector - Sum vector

SELECT vector.add([1.0, 2.0, 3.0], [2.0, 3.0, 4.0])
-- Returns: [3.0, 5.0, 7.0]

SELECT vector.add([1.0, 2.0, 3.0], 4.0)
-- Returns: [5.0, 6.0, 7.0]  (scalar broadcasting)

vector.subtract()

Returns element-wise difference of two vectors, or subtracts with a scalar (broadcasting). Order is preserved: vector - scalar subtracts the scalar from each element, while scalar - vector subtracts each element from the scalar.

Syntax: vector.subtract(<vector1> | <scalar>, <vector2> | <scalar>)

Returns: Vector - Difference vector

SELECT vector.subtract([3.0, 5.0, 7.0], [1.0, 2.0, 3.0])
-- Returns: [2.0, 3.0, 4.0]

SELECT vector.subtract([1.0, 2.0, 3.0], 1.0)
-- Returns: [0.0, 1.0, 2.0]  (vector - scalar)

SELECT vector.subtract(10.0, [1.0, 2.0, 3.0])
-- Returns: [9.0, 8.0, 7.0]  (scalar - vector)

vector.scale()

Scales vector element-wise by a scalar value.

Syntax: vector.scale(<vector>, <scalar>)

Returns: Vector - Scaled vector

SELECT vector.scale([1.0, 2.0, 3.0], 2.0)
-- Returns: [2.0, 4.0, 6.0]

vector.clip()

Clips (clamps) vector elements to a specified range [min, max]. Also available as vector.clamp() / vectorClamp().

Syntax: vector.clip(<vector>, <min>, <max>)

Returns: Vector - Clipped vector

SELECT vector.clip([1.0, 5.0, 10.0], 2.0, 8.0)
-- Returns: [2.0, 5.0, 8.0]

SELECT vector.clamp([1.0, 5.0, 10.0], 2.0, 8.0)
-- Returns: [2.0, 5.0, 8.0]  (clamp is an alias of clip)

vector.neighbors()

Returns k nearest neighbors from a vector index.

Syntax:

vector.neighbors(<index-spec>, <query-vector>, <k>)
vector.neighbors(<index-spec>, <query-vector>, <k>, <efSearch>)
vector.neighbors(<index-spec>, <query-vector>, <k>, { <option>: <value>, ... })

Parameters:

  • index-spec - Index specification as 'TypeName[propertyName]'

  • query-vector - Query vector or key to look up

  • k - Number of neighbors to return

  • efSearch - (optional, positional) Search beam width. Controls the trade-off between recall and speed. Higher values improve recall but increase latency. When omitted, the index uses adaptive efSearch.

Options map (optional, alternative to the positional efSearch argument):

Key Type Description

efSearch

Integer

Same semantics as the positional form.

filter

List of RIDs or RID strings

Restricts the search to the provided set of records. Useful to combine a vector search with a logical filter: first select the candidate RIDs, then pass them as filter. Unknown keys are rejected.

Returns: List - Nearest neighbors with distances

-- Default (adaptive efSearch)
SELECT vector.neighbors('Document[embedding]', [0.1, 0.2, 0.3], 5)

-- Explicit efSearch for higher recall (positional form, backward compatible)
SELECT vector.neighbors('Document[embedding]', [0.1, 0.2, 0.3], 5, 500)

-- Options map form (named, extensible)
SELECT vector.neighbors('Document[embedding]', [0.1, 0.2, 0.3], 5, { efSearch: 500 })

-- Combine a vector search with a logical filter on the same type
SELECT vector.neighbors(
         'Document[embedding]',
         [0.1, 0.2, 0.3],
         10,
         { filter: (SELECT @rid FROM Document WHERE tenantId = 'acme' AND category = 'finance') }
       )

-- Using a parameter binding for the RID list
SELECT vector.neighbors('Document[embedding]', :queryVector, 10, { efSearch: 300, filter: :allowedRids })

The options map rejects unknown keys with a descriptive error to catch typos, for example passing { efsearch: 500 } (lowercase) will fail. Keys are case sensitive.

Cypher Usage:

The recommended way to use vector search from Cypher is via CALL, which uses the HNSW index for fast approximate nearest neighbor search:

// ArcadeDB-native syntax
CALL vector.neighbors('Document[embedding]', $queryVector, 10)
YIELD name, distance
RETURN name, distance
ORDER BY distance

// Neo4j-compatible syntax (returns node + score)
CALL db.index.vector.queryNodes('Document[embedding]', 10, $queryVector)
YIELD node, score
RETURN node.title AS title, score
ORDER BY score DESC
db.index.vector.queryNodes() returns score (cosine similarity, 1.0 = identical), while vector.neighbors() returns distance (0.0 = identical). The relationship is score = 1 - distance.

Type-Specific Search with Inheritance:

When a vector index exists on a parent type, you can search specific child types:

-- Search only in EMBEDDING_IMAGE records
SELECT vector.neighbors('EMBEDDING_IMAGE[vector]', $queryVector, 10)

-- Search across all types (parent + children)
SELECT vector.neighbors('EMBEDDING[vector]', $queryVector, 10)

Normalization and Norms

vector.normalize()

Normalizes vector to unit length (L2 norm = 1.0).

Syntax: vector.normalize(<vector>)

Returns: Vector - Normalized vector

SELECT vector.normalize([3.0, 4.0])
-- Returns: [0.6, 0.8]

vector.isnormalized()

Checks if vector is normalized (L2 norm ≈ 1.0). The optional tolerance defaults to 0.001, which is an appropriate loose bound for ArcadeDB’s float vectors (sqrt(float eps) ≈ 3.45e-4); pass a smaller value to tighten the check.

Syntax: vector.isnormalized(<vector>, [tolerance])

Returns: Boolean - true if normalized

SELECT vector.isnormalized([0.6, 0.8])
-- Returns: true

vector.magnitude()

Computes L2 norm (Euclidean length) of vector. Also available as vector.l2Norm() / vectorL2Norm(), symmetric with vector.l1Norm() and vector.lInfNorm().

Syntax: vector.magnitude(<vector>)

Returns: Double - L2 norm

SELECT vector.magnitude([3.0, 4.0])
-- Returns: 5.0

SELECT vector.l2Norm([3.0, 4.0])
-- Returns: 5.0  (l2Norm is an alias of magnitude)

vector.l1norm()

Computes L1 norm (Manhattan norm) of vector.

Syntax: vector.l1norm(<vector>)

Returns: Double - L1 norm

SELECT vector.l1norm([1.0, 2.0, 3.0])
-- Returns: 6.0

vector.linfnorm()

Computes L∞ norm (maximum absolute value) of vector.

Syntax: vector.linfnorm(<vector>)

Returns: Double - L∞ norm

SELECT vector.linfnorm([1.0, -5.0, 3.0])
-- Returns: 5.0

Quantization

vector.quantizeint8()

Quantizes vector to 8-bit integers using min-max scaling.

Syntax: vector.quantizeint8(<vector>)

Returns: ByteArray - Quantized vector

SELECT vector.quantizeint8([0.1, 0.5, 0.9])

vector.dequantizeint8()

Dequantizes 8-bit integers back to float vector (approximate recovery). Pass the result of vector.quantizeint8() directly (the min/max are read from it), or supply the bytes with explicit min/max.

Syntax: vector.dequantizeint8(<result>) or vector.dequantizeint8(<quantized>, <min>, <max>)

Returns: Vector - Dequantized vector

SELECT vector.dequantizeint8(vector.quantizeint8([1.0, 2.0, 3.0]))
-- Returns: ~[1.0, 2.0, 3.0]

vector.quantizebinary()

Quantizes vector to binary (1 bit per dimension) using median threshold. The median is used (rather than a fixed value like 0) so the bits split ~50/50, which maximizes the information retained per bit for arbitrarily-centered embeddings.

Syntax: vector.quantizebinary(<vector>)

Returns: ByteArray - Binary quantized vector

SELECT vector.quantizebinary([0.1, 0.5, 0.9])

vector.dequantizebinary()

Reconstructs an approximate float vector from a binary-quantized result. Binary quantization keeps only one bit per dimension (the sign relative to the median), so this is lossy: each bit maps to highValue (default 1.0) when set or lowValue (default -1.0) when clear. Accepts the result of vector.quantizebinary() directly.

Syntax: vector.dequantizebinary(<quantized> [, <lowValue>, <highValue>])

Returns: Vector - Reconstructed (sign) vector

SELECT vector.dequantizebinary(vector.quantizebinary([0.1, 0.5, 0.9]))
-- Returns: [-1.0, 1.0, 1.0]

Scoring and Fusion

vector.hybridscore()

Computes weighted average of two scores.

Syntax: vector.hybridscore(<score1>, <score2>, <alpha>)

Returns: Double - Weighted average

SELECT vector.hybridscore(0.8, 0.6, 0.7)
-- Returns: 0.7 * 0.8 + 0.3 * 0.6 = 0.74

vector.multiscore()

Combines multiple scores using a fusion method.

Syntax: vector.multiscore(<scores>, <method>, [weights])

Methods:

  • 'MAX' - Maximum score (ColBERT style)

  • 'AVG' - Arithmetic average

  • 'MIN' - Minimum score

  • 'WEIGHTED' - Weighted average (requires weights)

Returns: Double - Combined score

SELECT vector.multiscore([0.9, 0.7, 0.8], 'MAX')
-- Returns: 0.9

vector.rrfscore()

Computes Reciprocal Rank Fusion (RRF) for combining multiple rankings.

Syntax: vector.rrfscore(<rank1>, <rank2>, …​, [{ k: <long> }]) or vector.rrfscore([<ranks>], [{ k: <long> }])

Ranks may be passed as variadic arguments or grouped in a single array/list (consistent with vector.multiscore). Every positional numeric argument is treated as a rank. The center constant k (default 60) is set only via the trailing options map { k: <long> } - a bare trailing number is always a rank, never k.

Parameters:

  • k - Center rank constant (default: 60), via { k: <long> }

Returns: Double - RRF score

SELECT vector.rrfscore(1, 2, 4)
-- Returns: 1/61 + 1/62 + 1/64

SELECT vector.rrfscore([1, 2, 4], { k: 100 })
-- Returns: 1/101 + 1/102 + 1/104

vector.normalizescores()

Normalizes scores to [0, 1] range using min-max normalization.

Syntax: vector.normalizescores(<scores>)

Returns: Vector - Normalized scores

SELECT vector.normalizescores([1.0, 2.0, 3.0])
-- Returns: [0.0, 0.5, 1.0]

vector.scoretransform()

Transforms scores using various functions.

Syntax: vector.scoretransform(<score>, <method>)

Methods:

  • 'LINEAR' - No transformation

  • 'SIGMOID' - Logistic function, maps to (0, 1)

  • 'TANH' - Hyperbolic tangent, maps to (-1, 1)

  • 'LOG' / 'LN' - Natural logarithm (requires a positive score); LN is the clearer synonym

  • 'EXP' - Exponential function

Returns: Double - Transformed score

SELECT vector.scoretransform(0.5, 'SIGMOID')

SELECT vector.scoretransform(0.5, 'TANH')
-- Returns: 0.4621172

SELECT vector.scoretransform(2.5, 'LN')
-- Returns: 0.9162907  (same as 'LOG')

Similarity and Distance

vector.dotproduct()

Computes dot product (inner product) between two vectors.

Syntax: vector.dotproduct(<vector1>, <vector2>)

Returns: Double - Dot product

SELECT vector.dotproduct([1.0, 2.0, 3.0], [4.0, 5.0, 6.0])
-- Returns: 32.0

vector.cosinesimilarity()

Computes cosine similarity between two vectors. Returns value between -1 and 1.

Syntax: vector.cosinesimilarity(<vector1>, <vector2>)

Returns: Double - Cosine similarity (-1 to 1)

SELECT vector.cosinesimilarity([1.0, 0.0], [1.0, 0.0])
-- Returns: 1.0 (identical direction)

SELECT vector.cosinesimilarity([1.0, 0.0], [0.0, 1.0])
-- Returns: 0.0 (orthogonal)

vector.l2distance()

Computes L2 distance (Euclidean distance) between two vectors.

Syntax: vector.l2distance(<vector1>, <vector2>)

Returns: Double - Euclidean distance

SELECT vector.l2distance([0.0, 0.0], [3.0, 4.0])
-- Returns: 5.0

vector.approxdistance()

Computes approximate distance between quantized vectors without full dequantization. When you pass the result objects of vector.quantizeint8()/vector.quantizebinary(), the mode is inferred and the third argument can be omitted; the explicit mode is still required for raw byte arrays.

Syntax: vector.approxdistance(<quantized1>, <quantized2> [, <mode>])

Modes:

  • 'INT8' - Faster than floats; preserves the top-k ordering of the true distances (the approximate scalar distances rank in the same order, even though their absolute values differ). Accepts a vector.quantizeint8() result or a raw int8 byte array.

  • 'BINARY' - Very fast Hamming distance, 8x fewer operations. Requires the vector.quantizebinary() result object (raw packed bits are not enough - the Hamming normalization needs the original length).

Returns: Double - Approximate distance

-- mode inferred from the quantization result objects
SELECT vector.approxdistance(
  vector.quantizeint8([1.0, 2.0, 3.0]),
  vector.quantizeint8([1.0, 3.0, 3.0])
)

Sparse Vectors

vector.densetosparse()

Converts dense vector to sparse representation.

Syntax: vector.densetosparse(<vector>, [threshold])

Parameters:

  • threshold - Values below this are considered zero (default: 0.0)

Returns: SparseVector - Sparse representation

SELECT vector.densetosparse([0.5, 0.0, 0.1], 0.2)
-- Only keeps elements >= 0.2

vector.sparsetodense()

Converts sparse vector back to dense representation.

Syntax: vector.sparsetodense(<sparsevector>)

Returns: Vector - Dense vector

SELECT vector.sparsetodense(vector.sparsecreate([0, 2], [0.5, 0.3]))

vector.sparsecreate()

Creates sparse vector from indices and values.

Syntax: vector.sparsecreate(<indices>, <values>, [dimension])

Returns: SparseVector - Sparse vector

SELECT vector.sparsecreate([0, 2, 5], [0.5, 0.3, 0.8], 7)

vector.sparsedot()

Computes dot product between two sparse vectors.

Syntax: vector.sparsedot(<sparse1>, <sparse2>)

Returns: Double - Dot product

SELECT vector.sparsedot(
  vector.densetosparse([1.0, 2.0, 3.0]),
  vector.densetosparse([1.0, 1.0, 1.0])
)

vector.sparsity()

Returns a sparsity measure of a vector. The threshold is optional and defaults to sqrt(eps) for float (~3.45e-4). An optional third argument selects the measure.

Syntax: vector.sparsity(<vector> [, <threshold> [, <mode>]])

Modes:

  • 'FRACTION' (default) - fraction of elements with |x| < threshold, in [0, 1]

  • 'L0' - the L0 pseudonorm: count of "significant" elements with |x| >= threshold (integer)

  • 'GMEAN' - geometric mean of the absolute values, exp(mean(ln|x|)); returns 0 if any element is 0

Returns: Double (FRACTION, GMEAN) or Integer (L0)

SELECT vector.sparsity([0.01, 0.1, 0.05, 0.02], 0.06)
-- Returns: 0.75 (3 of 4 elements are below 0.06)

SELECT vector.sparsity([0.0, 1.0, 2.0, 3.0])
-- Returns: 0.25 (uses the default threshold; one near-zero element)

SELECT vector.sparsity([0.0, 1.0, 2.0, 3.0], 0.5, 'L0')
-- Returns: 3 (three significant elements)

SELECT vector.sparsity([1.0, 2.0, 4.0], 0.0, 'GMEAN')
-- Returns: 2.0 (geometric mean)

Statistics

vector.variance()

Computes variance of vector elements.

Syntax: vector.variance(<vector>)

Returns: Double - Variance

SELECT vector.variance([1.0, 2.0, 3.0])

vector.stddev()

Computes standard deviation of vector elements.

Syntax: vector.stddev(<vector>)

Returns: Double - Standard deviation

SELECT vector.stddev([1.0, 2.0, 3.0])

vector.hasnan()

Checks if vector contains NaN values. A NULL element (e.g. produced by an invalid math op such as sqrt(-1.0) that the engine coerces to NULL inside a collection) is treated as NaN.

Syntax: vector.hasnan(<vector>)

Returns: Boolean - true if contains NaN

SELECT vector.hasnan([1.0, 2.0, 3.0])
-- Returns: false

SELECT vector.hasnan([1.0, sqrt(-1.0), 3.0])
-- Returns: true (the NULL element counts as NaN)

vector.hasinf()

Checks if vector contains infinity values. A NULL element is treated as NaN, which is not infinite, so it does not trigger a true result.

Syntax: vector.hasinf(<vector>)

Returns: Boolean - true if contains infinity

SELECT vector.hasinf([1.0, 10e400, 2.0])
-- Returns: true

vector.hasnull()

Checks if a vector contains any NULL element. Unlike vector.hasnan(), this distinguishes a genuine NULL (a missing element) from a NaN float value. Primitive float[]/double[]/int[]/long[] inputs can never hold a NULL and therefore always return false.

Syntax: vector.hasnull(<vector>)

Returns: Boolean - true if contains a NULL element

SELECT vector.hasnull([1.0, 2.0, 3.0])
-- Returns: false

SELECT vector.hasnull([1.0, sqrt(-1.0), 3.0])
-- Returns: true

Utility Functions

vector.tostring()

Converts vector to string representation. Equivalent to the asString() method on a vector value (e.g. embedding.asString('PYTHON')); both share the same formatter.

Syntax: vector.tostring(<vector>, [format])

Formats:

  • 'COMPACT' - Single line [1.0, 2.0, 3.0] (default)

  • 'PRETTY' - Multi-line with formatting

  • 'PYTHON' - Python list format

  • 'MATLAB' - space-separated row vector [1.0 2.0 3.0]

  • 'MATLAB_COLUMN' - semicolon-separated column vector [1.0; 2.0; 3.0]

  • 'JULIA' - Julia vector literal [1.0, 2.0, 3.0]

  • 'NUMPY' - bare comma-separated 1.0, 2.0, 3.0 (no brackets), for numpy.fromstring(…​, sep=",")

Returns: String - Formatted vector

SELECT vector.tostring([0.5, 0.25, 0.75], 'PYTHON')

SELECT vector.tostring([0.5, 0.25, 0.75], 'NUMPY')
-- Returns: 0.5, 0.25, 0.75