Indexes

SQL - `CREATE INDEX`

Creates a new index. Indexes can be:

Unique Where they don’t allow duplicates (LSM Tree).
Not Unique Where they allow duplicates (LSM Tree).
Unique Hash O(1) equality lookups, no duplicates (Hash Index).
Not Unique Hash O(1) equality lookups, allows duplicates (Hash Index).
Full Text Where they index any single word of text.
Geospatial Where they index WKT geometry strings for spatial queries.
HNSW (Hierarchical Navigable Small World) vector index.
LSMVectorIndex LSM Tree-based vector index for persistent similarity search.

There are several index algorithms available to determine how ArcadeDB indexes your database. For more information on these, see Indexes.

Syntax

CREATE INDEX [<manual-index-name>]
[ IF NOT EXISTS ]
[ ON <type> (<property> [BY KEY|VALUE|ITEM] [COLLATE CI][,]*) ]
<index-type> [<key-type>]
[ NULL_STRATEGY SKIP|ERROR|INDEX]

<manual-index-name> Only for manual indexes, defines the logical name for the index. For automatic indexes, the index name is assigned automatically by ArcadeDB at creation as <type>[<property>[,]*]. For example, the index created on type Friend, properties "firstName" and "lastName", it will be named "Friend[firstName,lastName]"
IF NOT EXISTS Specifying this option, the index creation will just be ignored if the index already exists (instead of failing with an error)
<type> Defines the type to create an automatic index for. The type must already exist.
<property> Defines the property you want to automatically index. The property must already exist.
- BY KEY index by key names for map properties.
- BY VALUE index by values for map properties.
- BY ITEM index by values for list elements.
- COLLATE CI enables case-insensitive collation for this property. Values are stored in lowercase internally, so lookups match regardless of case. Can be specified independently for each property in a composite index. See concepts/indexes.adoc#case-insensitive-indexes for details.
<index-type> Defines the index type you want to use:
- UNIQUE does not allow duplicate keys (LSM Tree),
- NOTUNIQUE allows duplicate keys (LSM Tree),
- UNIQUE_HASH does not allow duplicate keys (Hash Index — O(1) equality lookups, no range queries),
- NOTUNIQUE_HASH allows duplicate keys (Hash Index — O(1) equality lookups, no range queries),
- FULL_TEXT based on any single word of text. Uses Apache Lucene for tokenization and supports configurable analyzers, multi-property indexing, advanced query syntax, and BM25 relevance scoring. See how-to/data-modeling/full-text-index.adoc#full-text-index for full documentation.
- GEOSPATIAL indexes WKT geometry strings stored in STRING properties, enabling spatial queries via the geo.* SQL function namespace. See how-to/data-modeling/geospatial.adoc#geospatial for full documentation.
- HNSW vector index (Hierarchical Navigable Small World).
- LSMVECTORINDEX LSM Tree-based vector index.
- LSM_SPARSE_VECTOR posting-list inverted index for sparse-vector retrieval (BM25-style, SPLADE, BGE-M3, OpenSearch sparse). Available since v26.5.1. See Sparse Vector Search for full documentation.
<key-type> Defines the key type. With automatic indexes, the key type is automatically selected when the database reads the target schema property. For manual indexes, when not specified, it selects the key at run-time during the first insertion by reading the type of the type. In creating composite indexes, it uses a comma-separated list of types.

To create an automatic index bound to the schema property, use the ON clause. In order to create an index, the schema must already exist in your database.

In the event that the ON and <key-type> clauses both exist, the database validates the specified property types. If the property types don’t equal those specified in the key type list, it throws an exception.

The NULL_STRATEGY determines how null values are indexed:

SKIP (default) excludes null values from the index,
ERROR throws an error in case a null value is found during indexing,
INDEX adds null values to the index.

A unique index does not regard derived types or embedded documents of the indexed type.

You can use list key types when creating manual composite indexes, but bear in mind that such indexes are not yet fully supported.

Examples

Create an automatic index bound to the new property id in the type User:

ArcadeDB> CREATE PROPERTY User.id BINARY
ArcadeDB> CREATE INDEX ON User (id) UNIQUE

Create a series of automatic indexes for the thumbs map property in the type Movie. BY KEY indexes the map’s keys, BY VALUE its values. They are usually NOTUNIQUE, since the same key (or value) normally appears across many records:

ArcadeDB> CREATE INDEX ON Movie (thumbs BY KEY) NOTUNIQUE
ArcadeDB> CREATE INDEX ON Movie (thumbs BY VALUE) NOTUNIQUE

Once created, these indexes are automatically used when filtering a map by key or by value:

ArcadeDB> SELECT FROM Movie WHERE thumbs CONTAINSKEY 'poster'
ArcadeDB> SELECT FROM Movie WHERE thumbs CONTAINSVALUE 'poster.png'

Create a index for the actors list property in the type Movie. BY ITEM indexes each element of the list, so a filter on a single value uses the index:

ArcadeDB> CREATE INDEX ON Movie (actors BY ITEM) NOTUNIQUE
ArcadeDB> SELECT FROM Movie WHERE actors = 'Keanu Reeves'

Create a index for the actors.name list of embedded document (or map) property in the type Movie:

ArcadeDB> CREATE INDEX ON Movie (actors.name BY ITEM) NOTUNIQUE

Create a series of properties and on them create a composite index:

ArcadeDB> CREATE PROPERTY Book.author STRING
ArcadeDB> CREATE PROPERTY Book.title STRING
ArcadeDB> CREATE PROPERTY Book.publicationYears LIST
ArcadeDB> CREATE INDEX ON Book (author, title, publicationYears) UNIQUE

Create an index on an edge’s date range:

ArcadeDB> CREATE VERTEX TYPE File
ArcadeDB> CREATE EDGE TYPE Has
ArcadeDB> CREATE PROPERTY Has.started DATETIME
ArcadeDB> CREATE PROPERTY Has.ended DATETIME
ArcadeDB> CREATE INDEX ON Has (started, ended) NOTUNIQUE

You can create indexes on edge types only if they contain the begin and end date range of validity. This is use case is very common with historical graphs, such as the example above.

Using the above index, retrieve all the edges that existed in the year 2014:

ArcadeDB> SELECT FROM Has WHERE started >= '2014-01-01 00:00:00.000' AND
            ended < '2015-01-01 00:00:00.000'

Using the above index, retrieve all edges that existed in 2014 and write them to the parent file:

ArcadeDB> SELECT outV() FROM Has WHERE started >= '2014-01-01 00:00:00.000'
            AND ended < '2015-01-01 00:00:00.000'

Using the above index, retrieve all the 2014 edges and connect them to children files:

ArcadeDB> SELECT inV() FROM Has WHERE started >= '2014-01-01 00:00:00.000'
            AND ended < '2015-01-01 00:00:00.000'

Create an index that includes null values.

By default, indexes ignore null values. Queries against null values that use an index returns no entries. To return an error in case of null values, append NULL_STRATEGY ERROR when you create the index.

ArcadeDB> CREATE INDEX ON Employee (address) NOTUNIQUE NULL_STRATEGY ERROR

Create a case-insensitive index on a string property:

ArcadeDB> CREATE PROPERTY Product.Name STRING
ArcadeDB> CREATE INDEX ON Product (Name COLLATE CI) NOTUNIQUE

Queries like SELECT FROM Product WHERE Name = 'Hello World' will now match "Hello World", "HELLO WORLD", "hello world", etc.

Create a case-insensitive unique index to prevent duplicate entries ignoring case:

ArcadeDB> CREATE PROPERTY User.username STRING
ArcadeDB> CREATE INDEX ON User (username COLLATE CI) UNIQUE

This prevents inserting both "Admin" and "admin", since they are treated as the same key.

Create a composite index with case-insensitive collation on some properties only:

ArcadeDB> CREATE PROPERTY Product.Name STRING
ArcadeDB> CREATE PROPERTY Product.Code STRING
ArcadeDB> CREATE INDEX ON Product (Name COLLATE CI, Code) UNIQUE

In this example, Name uses case-insensitive collation while Code uses the default (case-sensitive) collation.

When a COLLATE CI index exists on a property, the query optimizer also recognizes queries that use .toLowerCase() (e.g., WHERE Name.toLowerCase() = 'hello world') and will use the case-insensitive index for these queries.

Create a unique hash index for O(1) primary key lookups:

ArcadeDB> CREATE PROPERTY User.email STRING
ArcadeDB> CREATE INDEX ON User (email) UNIQUE_HASH

Create a non-unique hash index for fast equality lookups:

ArcadeDB> CREATE PROPERTY Order.status STRING
ArcadeDB> CREATE INDEX ON Order (status) NOTUNIQUE_HASH

Hash indexes use extendable hashing and provide O(1) equality lookups — significantly faster than LSM Tree for point queries (WHERE key = value). However, they do not support range scans (<, >, BETWEEN) or ordered iteration (ORDER BY). Use UNIQUE/NOTUNIQUE (LSM Tree) when you need range queries or ordering.

Create a full-text index and search it using SEARCH_INDEX():

ArcadeDB> CREATE INDEX ON Employee (address) FULL_TEXT

SELECT * FROM Employee
WHERE SEARCH_INDEX('Employee[address]', 'New York')

LIKE and ILIKE do not use full-text indexes. See how-to/data-modeling/full-text-index.adoc#full-text-index for analyzer configuration, Lucene query syntax, relevance scoring, and More Like This.

Create a manual index to store dates:

ArcadeDB> CREATE INDEX `mostRecentRecords` UNIQUE DATE

Create a basic LSMVectorIndex for similarity search on vector embeddings:

ArcadeDB> CREATE VERTEX TYPE Doc
ArcadeDB> CREATE PROPERTY Doc.embedding ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {dimensions: 100, similarity: 'COSINE'}

By default, CREATE INDEX … LSM_VECTOR builds the HNSW graph immediately, so the index is ready to query as soon as the command completes. This is the recommended behavior for most use cases. If you prefer to defer graph construction to the first search (lazy mode), set buildGraphNow: false in the metadata.

Create an LSMVectorIndex with deferred graph building (useful when bulk-loading data after index creation):

ArcadeDB> CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {dimensions: 100, similarity: 'COSINE', buildGraphNow: false}

The graph will be built lazily on the first vector.neighbors() query or when REBUILD INDEX is executed.

Create an LSMVectorIndex with DOT_PRODUCT similarity metric:

ArcadeDB> CREATE VERTEX TYPE Image
ArcadeDB> CREATE PROPERTY Image.featureVector ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX ON Image (featureVector) LSM_VECTOR METADATA {dimensions: 512, similarity: 'DOT_PRODUCT'}

Create an LSMVectorIndex with EUCLIDEAN distance metric and custom performance parameters:

ArcadeDB> CREATE VERTEX TYPE Product
ArcadeDB> CREATE PROPERTY Product.attributes ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX ON Product (attributes) LSM_VECTOR METADATA {dimensions: 256, similarity: 'EUCLIDEAN', maxConnections: 32, beamWidth: 200}

Create an LSMVectorIndex with minimal metadata:

ArcadeDB> CREATE VERTEX TYPE CompactDoc
ArcadeDB> CREATE PROPERTY CompactDoc.vec ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX IF NOT EXISTS ON CompactDoc (vec) LSM_VECTOR METADATA {dimensions: 4, similarity: 'COSINE'}

Create an LSMVectorIndex with tuned performance parameters for large-scale indexing:

ArcadeDB> CREATE VERTEX TYPE VectorVertex
ArcadeDB> CREATE PROPERTY VectorVertex.embedding ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX IF NOT EXISTS ON VectorVertex (embedding) LSM_VECTOR METADATA {dimensions: 1024, similarity: 'COSINE', maxConnections: 16, beamWidth: 100}

Create an LSMVectorIndex with a custom efSearch for consistent high recall:

ArcadeDB> CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {dimensions: 768, similarity: 'COSINE', quantization: 'INT8', efSearch: 200}

When efSearch is set in the index metadata, all queries on this index use the specified beam width by default. Individual queries can still override it by passing efSearch as the 4th argument to vector.neighbors().

Create an LSMVectorIndex on embeddings with metadata about source:

ArcadeDB> CREATE VERTEX TYPE Doc
ArcadeDB> CREATE PROPERTY Doc.content STRING
ArcadeDB> CREATE PROPERTY Doc.embedding ARRAY_OF_FLOATS
ArcadeDB> CREATE PROPERTY Doc.source STRING
ArcadeDB> CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {dimensions: 384, similarity: 'COSINE', maxConnections: 16, beamWidth: 100}

Then query for similar documents:

ArcadeDB> SELECT expand(vector.neighbors('Doc[embedding]', [0.1, 0.2, 0.3], 10))

Create multiple LSMVectorIndexes on different properties in the same type (multi-modal embeddings):

ArcadeDB> CREATE VERTEX TYPE MultiModalRecord
ArcadeDB> CREATE PROPERTY MultiModalRecord.imageEmbedding ARRAY_OF_FLOATS
ArcadeDB> CREATE PROPERTY MultiModalRecord.textEmbedding ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX ON MultiModalRecord (imageEmbedding) LSM_VECTOR METADATA {dimensions: 512, similarity: 'COSINE', maxConnections: 16, beamWidth: 100}
ArcadeDB> CREATE INDEX ON MultiModalRecord (textEmbedding) LSM_VECTOR METADATA {dimensions: 768, similarity: 'COSINE', maxConnections: 16, beamWidth: 100}

Query for similar records using the image embedding:

ArcadeDB> SELECT expand(vector.neighbors('MultiModalRecord[imageEmbedding]', [0.1, 0.2, 0.3], 5))

Create an LSMVectorIndex with tuned parameters for high-dimensional data:

ArcadeDB> CREATE VERTEX TYPE Word
ArcadeDB> CREATE PROPERTY Word.vector ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX ON Word (vector) LSM_VECTOR METADATA {dimensions: 100, similarity: 'COSINE', maxConnections: 16, beamWidth: 100}

Create an LSMVectorIndex with INT8 quantization for 4x memory savings:

ArcadeDB> CREATE VERTEX TYPE LargeCorpus
ArcadeDB> CREATE PROPERTY LargeCorpus.embedding ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX ON LargeCorpus (embedding) LSM_VECTOR METADATA {dimensions: 1536, similarity: 'COSINE', quantization: 'INT8'}

The index will automatically quantize vectors on insert and dequantize on retrieval, reducing memory usage by 75% with minimal accuracy loss (<2%).

Create an LSMVectorIndex with BINARY quantization for extreme compression (32x memory savings):

ArcadeDB> CREATE VERTEX TYPE MassiveDataset
ArcadeDB> CREATE PROPERTY MassiveDataset.embedding ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX ON MassiveDataset (embedding) LSM_VECTOR METADATA {dimensions: 1536, similarity: 'COSINE', quantization: 'BINARY'}

Binary quantization compresses each dimension to 1 bit, achieving 97% memory reduction. Suitable for approximate search with reranking.

Create an LSMVectorIndex with PRODUCT quantization for zero-disk-I/O graph construction on large datasets:

ArcadeDB> CREATE VERTEX TYPE BigDataset
ArcadeDB> CREATE PROPERTY BigDataset.embedding ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX ON BigDataset (embedding) LSM_VECTOR METADATA {dimensions: 1024, similarity: 'COSINE', quantization: 'PRODUCT'}

Product quantization builds the graph using compact PQ codes that stay in memory, eliminating disk I/O during construction. Most effective on large datasets (100K+ vectors).

Quantization types: NONE (default, no compression), INT8 (recommended, 4x compression, <2% accuracy loss), BINARY (32x compression, approximate search), PRODUCT (16-64x compression, zero-disk-I/O graph build). Quantization is transparent - vectors are stored compressed but queries work the same way.

Create an LSMVectorIndex with inline vector storage for large-scale deployments (100K+ vectors):

ArcadeDB> CREATE VERTEX TYPE LargeScale
ArcadeDB> CREATE PROPERTY LargeScale.embedding ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX ON LargeScale (embedding) LSM_VECTOR METADATA {
  dimensions: 1536,
  similarity: 'COSINE',
  quantization: 'INT8',
  storeVectorsInGraph: true,
  mutationsBeforeRebuild: 1000
}

The storeVectorsInGraph option stores vectors inline within the JVector graph file, eliminating expensive document lookups during search operations. This dramatically reduces RAM pressure for large indexes (100K+ vectors) by avoiding document page cache thrashing.

When to use storeVectorsInGraph:

Enable (true) for: Large indexes (100K+ vectors), high search throughput, read-heavy workloads, RAM-constrained environments
Disable (false, default) for: Small indexes (<10K vectors), frequent vector updates, write-heavy workloads

Trade-offs:

Benefit: 70-80% RAM reduction for large indexes by co-locating vectors with graph topology
Cost: Vectors duplicated on disk (stored in both documents and graph file)
Best with: Combine with quantization=INT8 or BINARY for maximum efficiency (reduced disk + reduced RAM)

Example - optimal large-scale configuration:

ArcadeDB> CREATE INDEX ON LargeScale (embedding) LSM_VECTOR METADATA {
  dimensions: 1536,
  similarity: 'COSINE',
  quantization: 'INT8',              -- 4x memory reduction (recommended)
  storeVectorsInGraph: true,         -- Fast co-located access
  efSearch: 200,                     -- Consistent high recall
  maxConnections: 32,                -- Vamana per-layer degree (default); = hnswlib M=16
  beamWidth: 100,
  locationCacheSize: 200000,         -- Cache hot vectors
  mutationsBeforeRebuild: 1000,      -- Reduce rebuild frequency
  inactivityRebuildTimeoutMs: 30000  -- Flush buffered vectors after 30s of inactivity
}

Create a geospatial index on a WKT property and query it:

CREATE VERTEX TYPE Location
CREATE PROPERTY Location.coords STRING
CREATE INDEX ON Location (coords) GEOSPATIAL

INSERT INTO Location SET name = 'Eiffel Tower', coords = 'POINT(2.2945 48.8584)'

SELECT name FROM Location
WHERE geo.within(coords, geo.geomFromText(
  'POLYGON((2.28 48.84, 2.42 48.84, 2.42 48.90, 2.28 48.90, 2.28 48.84))'
)) = true

See how-to/data-modeling/geospatial.adoc#geospatial for full documentation on geometry formats, predicate functions, and precision configuration.

Create a sparse vector index for SPLADE / BM25 / BGE-M3 hybrid retrieval. Available since v26.5.1.

CREATE DOCUMENT TYPE Doc
CREATE PROPERTY Doc.tokens  ARRAY_OF_INTEGERS
CREATE PROPERTY Doc.weights ARRAY_OF_FLOATS

CREATE INDEX ON Doc (tokens, weights) LSM_SPARSE_VECTOR
  METADATA { dimensions: 105000, modifier: 'IDF', weightQuantization: 'FP32' }

The two parallel array properties carry the non-zero dimension ids and matching weights for each record. dimensions declares the vocabulary cap and rejects out-of-range ids at write time (set to 0 to disable the cap). modifier: 'IDF' enables Robertson-Sparck-Jones IDF weighting at query time; omit it (or set to 'NONE') for plain dot-product scoring.

weightQuantization (since v26.7.2) controls how posting weights are stored on disk, bringing the sparse index to parity with the dense LSM_VECTOR index’s quantization knob:

Value Storage / trade-off

Value	Storage / trade-off
`INT8`	1 byte per weight (the default, and the only behavior before v26.7.2). Smallest index; scores are approximate.
`FP16`	2 bytes per weight. Half the size of `FP32` with near-exact scoring.
`FP32`	4 bytes per weight. Exact scoring; largest index. Choose this when approximate weights measurably hurt ranking quality.

INT8

1 byte per weight (the default, and the only behavior before v26.7.2). Smallest index; scores are approximate.

FP16

2 bytes per weight. Half the size of FP32 with near-exact scoring.

FP32

4 bytes per weight. Exact scoring; largest index. Choose this when approximate weights measurably hurt ranking quality.

Omit weightQuantization to keep the INT8 default. The setting is fixed at index-creation time; to change it, drop and recreate (or REBUILD INDEX) the index.

Query with vector.sparseNeighbors:

SELECT expand(`vector.sparseNeighbors`('Doc[tokens,weights]', :qIdx, :qVal, 50))

For server-side hybrid retrieval (dense + sparse + full-text fused into one ranked top-K), see Hybrid Search with vector.fuse.

After a large bulk load, run COMPACT INDEX to merge the index segments and get lower, more stable query latency without waiting for the background compaction.

For more information, see:

COMPACT INDEX
DROP INDEX
Indexes

SQL - `REBUILD INDEX`

Rebuilds automatic indexes.

Syntax

REBUILD INDEX <index-name>

<index-name> It is the index name that you want to rebuild. Use * to rebuild all automatic indexes. Quote the index name if it contains special characters like square brackets.

During the rebuild, any idempotent queries made against the index, skip the index and perform sequential scans. This means that queries run slower during this operation. Non-idempotent commands, such as INSERT, UPDATE, and DELETE are blocked waiting until the indexes are rebuilt.

During normal operations an index rebuild is not necessary. Rebuild an index only if it breaks.

To build a large LSM_TREE index (unique or non-unique) over data that is already loaded (single-server, no active transaction), the Java API offers a much faster offline sorted bulk build that constructs the index directly in compacted form. See Sorted bulk build.

Examples

Rebuild an index on the email property on the type Profile:

ArcadeDB> REBUILD INDEX `Profile[email]`

Rebuild all indexes:

ArcadeDB> REBUILD INDEX *

For more information, see:

CREATE INDEX
COMPACT INDEX
DROP INDEX
Indexes

SQL - `COMPACT INDEX`

Forces an immediate compaction of automatic indexes, merging their on-disk segments into a compacted form.

ArcadeDB compacts indexes automatically in the background. COMPACT INDEX lets you trigger it on demand, which is useful right after a bulk load to settle an index and get lower, more stable query latency without waiting for the background schedule. Unlike REBUILD INDEX, it does not re-index the data: it only merges the existing segments, so it is much faster.

It works for standard LSM-Tree indexes as well as vector and sparse-vector indexes, and runs synchronously (it returns when the compaction has finished). Because it is a regular SQL command, it can be executed from any client, including the HTTP /command endpoint.

Syntax

COMPACT INDEX <index-name>

<index-name> It is the index name that you want to compact. Use * to compact all automatic indexes. Quote the index name if it contains special characters like square brackets.

Examples

Compact the sparse-vector index on the tokens and weights properties of the type Doc:

ArcadeDB> COMPACT INDEX `Doc[tokens,weights]`

Compact all indexes:

ArcadeDB> COMPACT INDEX *

For more information, see:

CREATE INDEX
REBUILD INDEX
Indexes

SQL - `DROP INDEX`

Removes an index from a property defined in the schema.

If the index does not exist, this call just returns with no errors.

Syntax

DROP INDEX <index-name> [ IF EXISTS ]

<index-name> Defines the name of the index.

Examples

Remove the index on the Id property of the Users type:

ArcadeDB> DROP INDEX `Users[Id]`

For more information, see:

CREATE INDEX
Indexes

Indexes

SQL - CREATE INDEX

SQL - REBUILD INDEX

SQL - COMPACT INDEX

SQL - DROP INDEX

SQL - `CREATE INDEX`

SQL - `REBUILD INDEX`

SQL - `COMPACT INDEX`

SQL - `DROP INDEX`