Indexes
SQL - CREATE INDEX
Creates a new index. Indexes can be:
-
Unique Where they don’t allow duplicates (LSM Tree).
-
Not Unique Where they allow duplicates (LSM Tree).
-
Unique Hash O(1) equality lookups, no duplicates (Hash Index).
-
Not Unique Hash O(1) equality lookups, allows duplicates (Hash Index).
-
Full Text Where they index any single word of text.
-
Geospatial Where they index WKT geometry strings for spatial queries.
-
HNSW (Hierarchical Navigable Small World) vector index.
-
LSMVectorIndex LSM Tree-based vector index for persistent similarity search.
There are several index algorithms available to determine how ArcadeDB indexes your database. For more information on these, see Indexes.
Syntax
CREATE INDEX [<manual-index-name>]
[ IF NOT EXISTS ]
[ ON <type> (<property> [BY KEY|VALUE|ITEM] [COLLATE CI][,]*) ]
<index-type> [<key-type>]
[ NULL_STRATEGY SKIP|ERROR|INDEX]
-
<manual-index-name>Only for manual indexes, defines the logical name for the index. For automatic indexes, the index name is assigned automatically by ArcadeDB at creation as<type>[<property>[,]*]. For example, the index created on type Friend, properties "firstName" and "lastName", it will be named "Friend[firstName,lastName]" -
IF NOT EXISTSSpecifying this option, the index creation will just be ignored if the index already exists (instead of failing with an error) -
<type>Defines the type to create an automatic index for. The type must already exist. -
<property>Defines the property you want to automatically index. The property must already exist.-
BY KEYindex by key names for map properties. -
BY VALUEindex by values for map properties. -
BY ITEMindex by values for list elements. -
COLLATE CIenables case-insensitive collation for this property. Values are stored in lowercase internally, so lookups match regardless of case. Can be specified independently for each property in a composite index. See concepts/indexes.adoc#case-insensitive-indexes for details.
-
-
<index-type>Defines the index type you want to use:-
UNIQUEdoes not allow duplicate keys (LSM Tree), -
NOTUNIQUEallows duplicate keys (LSM Tree), -
UNIQUE_HASHdoes not allow duplicate keys (Hash Index — O(1) equality lookups, no range queries), -
NOTUNIQUE_HASHallows duplicate keys (Hash Index — O(1) equality lookups, no range queries), -
FULL_TEXTbased on any single word of text. Uses Apache Lucene for tokenization and supports configurable analyzers, multi-property indexing, and advanced query syntax. See how-to/data-modeling/full-text-index.adoc#full-text-index for full documentation. -
GEOSPATIALindexes WKT geometry strings stored inSTRINGproperties, enabling spatial queries via thegeo.*SQL function namespace. See how-to/data-modeling/geospatial.adoc#geospatial for full documentation. -
HNSWvector index (Hierarchical Navigable Small World). -
LSMVECTORINDEXLSM Tree-based vector index.
-
-
<key-type>Defines the key type. With automatic indexes, the key type is automatically selected when the database reads the target schema property. For manual indexes, when not specified, it selects the key at run-time during the first insertion by reading the type of the type. In creating composite indexes, it uses a comma-separated list of types.
To create an automatic index bound to the schema property, use the ON clause.
In order to create an index, the schema must already exist in your database.
In the event that the ON and <key-type> clauses both exist, the database validates the specified property types.
If the property types don’t equal those specified in the key type list, it throws an exception.
The NULL_STRATEGY determines how null values are indexed:
-
SKIP(default) excludes null values from the index, -
ERRORthrows an error in case a null value is found during indexing, -
INDEXadds null values to the index.
| A unique index does not regard derived types or embedded documents of the indexed type. |
You can use list key types when creating manual composite indexes, but bear in mind that such indexes are not yet fully supported.
Examples
-
Create an automatic index bound to the new property
idin the typeUser:
ArcadeDB> CREATE PROPERTY User.id BINARY ArcadeDB> CREATE INDEX ON User (id) UNIQUE
-
Create a series automatic indexes for the
thumbsproperty in the typeMovie:
ArcadeDB> CREATE INDEX ON Movie (thumbs) UNIQUE ArcadeDB> CREATE INDEX ON Movie (thumbs BY KEY) UNIQUE ArcadeDB> CREATE INDEX ON Movie (thumbs BY VALUE) UNIQUE
-
Create a index for the
actorslist property in the typeMovie:
ArcadeDB> CREATE INDEX ON Movie (actors BY ITEM) NOTUNIQUE
-
Create a index for the
actors.namelist of embedded document (or map) property in the typeMovie:
ArcadeDB> CREATE INDEX ON Movie (actors.name BY ITEM) NOTUNIQUE
-
Create a series of properties and on them create a composite index:
ArcadeDB> CREATE PROPERTY Book.author STRING ArcadeDB> CREATE PROPERTY Book.title STRING ArcadeDB> CREATE PROPERTY Book.publicationYears LIST ArcadeDB> CREATE INDEX ON Book (author, title, publicationYears) UNIQUE
-
Create an index on an edge’s date range:
ArcadeDB> CREATE VERTEX TYPE File ArcadeDB> CREATE EDGE TYPE Has ArcadeDB> CREATE PROPERTY Has.started DATETIME ArcadeDB> CREATE PROPERTY Has.ended DATETIME ArcadeDB> CREATE INDEX ON Has (started, ended) NOTUNIQUE
| You can create indexes on edge types only if they contain the begin and end date range of validity. This is use case is very common with historical graphs, such as the example above. |
-
Using the above index, retrieve all the edges that existed in the year 2014:
ArcadeDB> SELECT FROM Has WHERE started >= '2014-01-01 00:00:00.000' AND
ended < '2015-01-01 00:00:00.000'
-
Using the above index, retrieve all edges that existed in 2014 and write them to the parent file:
ArcadeDB> SELECT outV() FROM Has WHERE started >= '2014-01-01 00:00:00.000'
AND ended < '2015-01-01 00:00:00.000'
-
Using the above index, retrieve all the 2014 edges and connect them to children files:
ArcadeDB> SELECT inV() FROM Has WHERE started >= '2014-01-01 00:00:00.000'
AND ended < '2015-01-01 00:00:00.000'
-
Create an index that includes null values.
By default, indexes ignore null values.
Queries against null values that use an index returns no entries.
To return an error in case of null values, append NULL_STRATEGY ERROR when you create the index.
ArcadeDB> CREATE INDEX ON Employee (address) NOTUNIQUE NULL_STRATEGY ERROR
-
Create a case-insensitive index on a string property:
ArcadeDB> CREATE PROPERTY Product.Name STRING ArcadeDB> CREATE INDEX ON Product (Name COLLATE CI) NOTUNIQUE
Queries like SELECT FROM Product WHERE Name = 'Hello World' will now match "Hello World", "HELLO WORLD", "hello world", etc.
-
Create a case-insensitive unique index to prevent duplicate entries ignoring case:
ArcadeDB> CREATE PROPERTY User.username STRING ArcadeDB> CREATE INDEX ON User (username COLLATE CI) UNIQUE
This prevents inserting both "Admin" and "admin", since they are treated as the same key.
-
Create a composite index with case-insensitive collation on some properties only:
ArcadeDB> CREATE PROPERTY Product.Name STRING ArcadeDB> CREATE PROPERTY Product.Code STRING ArcadeDB> CREATE INDEX ON Product (Name COLLATE CI, Code) UNIQUE
In this example, Name uses case-insensitive collation while Code uses the default (case-sensitive) collation.
When a COLLATE CI index exists on a property, the query optimizer also recognizes queries that use .toLowerCase() (e.g., WHERE Name.toLowerCase() = 'hello world') and will use the case-insensitive index for these queries.
|
-
Create a unique hash index for O(1) primary key lookups:
ArcadeDB> CREATE PROPERTY User.email STRING ArcadeDB> CREATE INDEX ON User (email) UNIQUE_HASH
-
Create a non-unique hash index for fast equality lookups:
ArcadeDB> CREATE PROPERTY Order.status STRING ArcadeDB> CREATE INDEX ON Order (status) NOTUNIQUE_HASH
Hash indexes use extendable hashing and provide O(1) equality lookups — significantly faster than LSM Tree for point queries (WHERE key = value). However, they do not support range scans (<, >, BETWEEN) or ordered iteration (ORDER BY). Use UNIQUE/NOTUNIQUE (LSM Tree) when you need range queries or ordering.
|
-
Create a full-text index and search it using
SEARCH_INDEX():
ArcadeDB> CREATE INDEX ON Employee (address) FULL_TEXT
SELECT * FROM Employee
WHERE SEARCH_INDEX('Employee[address]', 'New York')
LIKE and ILIKE do not use full-text indexes.
See how-to/data-modeling/full-text-index.adoc#full-text-index for analyzer configuration, Lucene query syntax, relevance scoring, and More Like This.
|
-
Create a manual index to store dates:
ArcadeDB> CREATE INDEX `mostRecentRecords` UNIQUE DATE
-
Create a basic LSMVectorIndex for similarity search on vector embeddings:
ArcadeDB> CREATE VERTEX TYPE Doc
ArcadeDB> CREATE PROPERTY Doc.embedding ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {dimensions: 100, similarity: 'COSINE'}
By default, CREATE INDEX … LSM_VECTOR builds the HNSW graph immediately, so the index is ready to query as soon as the command completes. This is the recommended behavior for most use cases. If you prefer to defer graph construction to the first search (lazy mode), set buildGraphNow: false in the metadata.
|
-
Create an LSMVectorIndex with deferred graph building (useful when bulk-loading data after index creation):
ArcadeDB> CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {dimensions: 100, similarity: 'COSINE', buildGraphNow: false}
The graph will be built lazily on the first vectorNeighbors() query or when REBUILD INDEX is executed.
-
Create an LSMVectorIndex with DOT_PRODUCT similarity metric:
ArcadeDB> CREATE VERTEX TYPE Image
ArcadeDB> CREATE PROPERTY Image.featureVector ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX ON Image (featureVector) LSM_VECTOR METADATA {dimensions: 512, similarity: 'DOT_PRODUCT'}
-
Create an LSMVectorIndex with EUCLIDEAN distance metric and custom performance parameters:
ArcadeDB> CREATE VERTEX TYPE Product
ArcadeDB> CREATE PROPERTY Product.attributes ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX ON Product (attributes) LSM_VECTOR METADATA {dimensions: 256, similarity: 'EUCLIDEAN', maxConnections: 32, beamWidth: 200}
-
Create an LSMVectorIndex with minimal metadata:
ArcadeDB> CREATE VERTEX TYPE CompactDoc
ArcadeDB> CREATE PROPERTY CompactDoc.vec ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX IF NOT EXISTS ON CompactDoc (vec) LSM_VECTOR METADATA {dimensions: 4, similarity: 'COSINE'}
-
Create an LSMVectorIndex with tuned performance parameters for large-scale indexing:
ArcadeDB> CREATE VERTEX TYPE VectorVertex
ArcadeDB> CREATE PROPERTY VectorVertex.embedding ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX IF NOT EXISTS ON VectorVertex (embedding) LSM_VECTOR METADATA {dimensions: 1024, similarity: 'COSINE', maxConnections: 16, beamWidth: 100}
-
Create an LSMVectorIndex with a custom efSearch for consistent high recall:
ArcadeDB> CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {dimensions: 768, similarity: 'COSINE', quantization: 'INT8', efSearch: 200}
When efSearch is set in the index metadata, all queries on this index use the specified beam width by default. Individual queries can still override it by passing efSearch as the 4th argument to vectorNeighbors().
-
Create an LSMVectorIndex on embeddings with metadata about source:
ArcadeDB> CREATE VERTEX TYPE Doc
ArcadeDB> CREATE PROPERTY Doc.content STRING
ArcadeDB> CREATE PROPERTY Doc.embedding ARRAY_OF_FLOATS
ArcadeDB> CREATE PROPERTY Doc.source STRING
ArcadeDB> CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {dimensions: 384, similarity: 'COSINE', maxConnections: 16, beamWidth: 100}
Then query for similar documents:
ArcadeDB> SELECT expand(vectorNeighbors('Doc[embedding]', [0.1, 0.2, 0.3], 10))
-
Create multiple LSMVectorIndexes on different properties in the same type (multi-modal embeddings):
ArcadeDB> CREATE VERTEX TYPE MultiModalRecord
ArcadeDB> CREATE PROPERTY MultiModalRecord.imageEmbedding ARRAY_OF_FLOATS
ArcadeDB> CREATE PROPERTY MultiModalRecord.textEmbedding ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX ON MultiModalRecord (imageEmbedding) LSM_VECTOR METADATA {dimensions: 512, similarity: 'COSINE', maxConnections: 16, beamWidth: 100}
ArcadeDB> CREATE INDEX ON MultiModalRecord (textEmbedding) LSM_VECTOR METADATA {dimensions: 768, similarity: 'COSINE', maxConnections: 16, beamWidth: 100}
Query for similar records using the image embedding:
ArcadeDB> SELECT expand(vectorNeighbors('MultiModalRecord[imageEmbedding]', [0.1, 0.2, 0.3], 5))
-
Create an LSMVectorIndex with tuned parameters for high-dimensional data:
ArcadeDB> CREATE VERTEX TYPE Word
ArcadeDB> CREATE PROPERTY Word.vector ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX ON Word (vector) LSM_VECTOR METADATA {dimensions: 100, similarity: 'COSINE', maxConnections: 16, beamWidth: 100}
-
Create an LSMVectorIndex with INT8 quantization for 4x memory savings:
ArcadeDB> CREATE VERTEX TYPE LargeCorpus
ArcadeDB> CREATE PROPERTY LargeCorpus.embedding ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX ON LargeCorpus (embedding) LSM_VECTOR METADATA {dimensions: 1536, similarity: 'COSINE', quantization: 'INT8'}
The index will automatically quantize vectors on insert and dequantize on retrieval, reducing memory usage by 75% with minimal accuracy loss (<2%).
-
Create an LSMVectorIndex with BINARY quantization for extreme compression (32x memory savings):
ArcadeDB> CREATE VERTEX TYPE MassiveDataset
ArcadeDB> CREATE PROPERTY MassiveDataset.embedding ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX ON MassiveDataset (embedding) LSM_VECTOR METADATA {dimensions: 1536, similarity: 'COSINE', quantization: 'BINARY'}
Binary quantization compresses each dimension to 1 bit, achieving 97% memory reduction. Suitable for approximate search with reranking.
-
Create an LSMVectorIndex with PRODUCT quantization for zero-disk-I/O graph construction on large datasets:
ArcadeDB> CREATE VERTEX TYPE BigDataset
ArcadeDB> CREATE PROPERTY BigDataset.embedding ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX ON BigDataset (embedding) LSM_VECTOR METADATA {dimensions: 1024, similarity: 'COSINE', quantization: 'PRODUCT'}
Product quantization builds the graph using compact PQ codes that stay in memory, eliminating disk I/O during construction. Most effective on large datasets (100K+ vectors).
Quantization types: NONE (default, no compression), INT8 (recommended, 4x compression, <2% accuracy loss), BINARY (32x compression, approximate search), PRODUCT (16-64x compression, zero-disk-I/O graph build). Quantization is transparent - vectors are stored compressed but queries work the same way.
|
-
Create an LSMVectorIndex with inline vector storage for large-scale deployments (100K+ vectors):
ArcadeDB> CREATE VERTEX TYPE LargeScale
ArcadeDB> CREATE PROPERTY LargeScale.embedding ARRAY_OF_FLOATS
ArcadeDB> CREATE INDEX ON LargeScale (embedding) LSM_VECTOR METADATA {
dimensions: 1536,
similarity: 'COSINE',
quantization: 'INT8',
storeVectorsInGraph: true,
mutationsBeforeRebuild: 1000
}
The storeVectorsInGraph option stores vectors inline within the JVector graph file, eliminating expensive document lookups during search operations. This dramatically reduces RAM pressure for large indexes (100K+ vectors) by avoiding document page cache thrashing.
When to use storeVectorsInGraph:
-
Enable (
true) for: Large indexes (100K+ vectors), high search throughput, read-heavy workloads, RAM-constrained environments -
Disable (
false, default) for: Small indexes (<10K vectors), frequent vector updates, write-heavy workloads
Trade-offs:
-
Benefit: 70-80% RAM reduction for large indexes by co-locating vectors with graph topology
-
Cost: Vectors duplicated on disk (stored in both documents and graph file)
-
Best with: Combine with
quantization=INT8orBINARYfor maximum efficiency (reduced disk + reduced RAM)
Example - optimal large-scale configuration:
ArcadeDB> CREATE INDEX ON LargeScale (embedding) LSM_VECTOR METADATA {
dimensions: 1536,
similarity: 'COSINE',
quantization: 'INT8', -- 4x memory reduction (recommended)
storeVectorsInGraph: true, -- Fast co-located access
efSearch: 200, -- Consistent high recall
maxConnections: 16,
beamWidth: 100,
locationCacheSize: 200000, -- Cache hot vectors
mutationsBeforeRebuild: 1000, -- Reduce rebuild frequency
inactivityRebuildTimeoutMs: 30000 -- Flush buffered vectors after 30s of inactivity
}
-
Create a geospatial index on a WKT property and query it:
CREATE VERTEX TYPE Location
CREATE PROPERTY Location.coords STRING
CREATE INDEX ON Location (coords) GEOSPATIAL
INSERT INTO Location SET name = 'Eiffel Tower', coords = 'POINT(2.2945 48.8584)'
SELECT name FROM Location
WHERE geo.within(coords, geo.geomFromText(
'POLYGON((2.28 48.84, 2.42 48.84, 2.42 48.90, 2.28 48.90, 2.28 48.84))'
)) = true
See how-to/data-modeling/geospatial.adoc#geospatial for full documentation on geometry formats, predicate functions, and precision configuration.
For more information, see:
SQL - REBUILD INDEX
Rebuilds automatic indexes.
Syntax
REBUILD INDEX <index-name>
-
<index-name>It is the index name that you want to rebuild. Use*to rebuild all automatic indexes. Quote the index name if it contains special characters like square brackets.
During the rebuild, any idempotent queries made against the index, skip the index and perform sequential scans. This means
that queries run slower during this operation. Non-idempotent commands, such as INSERT, UPDATE,
and DELETE are blocked waiting until the indexes are rebuilt.
|
| During normal operations an index rebuild is not necessary. Rebuild an index only if it breaks. |
Examples
-
Rebuild an index on the
emailproperty on the typeProfile:
ArcadeDB> REBUILD INDEX `Profile[email]`
-
Rebuild all indexes:
ArcadeDB> REBUILD INDEX *
For more information, see:
SQL - DROP INDEX
Removes an index from a property defined in the schema.
If the index does not exist, this call just returns with no errors.
Syntax
DROP INDEX <index-name> [ IF EXISTS ]
-
<index-name>Defines the name of the index.
Examples
-
Remove the index on the
Idproperty of theUserstype:
ArcadeDB> DROP INDEX `Users[Id]`
For more information, see: