Schema

ArcadeDB can work in schema-less mode (like most of NoSQL DBMS), schema-full (like with RDBMS) or hybrid. The main API to manage the schema is the Schema interface you can obtain by calling the API db.getSchema():

Schema schema = db.getSchema();

Before creating any record it’s mandatory to define a type. If you’re going to create a new Document, then you need a Document Type. The same applies for Vertex → Vertex Type and Edge → Edge Type.

The specific API to manage document types in the Schema interface are:

DocumentType createDocumentType(String typeName);
DocumentType createDocumentType(String typeName, int buckets);
DocumentType createDocumentType(String typeName, int buckets, int pageSize);

Where:

  • typeName is the name of the type

  • buckets is the number of buckets to create. A bucket is like a file. If not specified, the number of available cores is used

  • pageSize is the page size for the file. If not specified is 65K. Pay attention to this value. In case of large objects to store, you need to increase the page size or the record won’t be stored, throwing an exception.

To manage vertex types, the API are similar as for the document types:

VertexType createVertexType(String typeName);
VertexType createVertexType(String typeName, int buckets);
VertexType createVertexType(String typeName, int buckets, int pageSize);

And the same for edge types:

EdgeType createEdgeType(String typeName);
EdgeType createEdgeType(String typeName, int buckets);
EdgeType createEdgeType(String typeName, int buckets, int pageSize);

In order to retrieve and removing a type, API common to any record type are provided:

Collection<DocumentType> getTypes();
DocumentType             getType(String typeName);
void                     dropType(String typeName);
String                   getTypeNameByBucketId(int bucketId);
DocumentType             getTypeByBucketId(int bucketId);
boolean                  existsType(String typeName);

Working with buckets

A bucket is like a file. A type can rely on one or multiple buckets. Why using multiple buckets? Because ArcadeDB could lock a bucket for certain operations. Having multiple buckets allows to go in parallel with a multi-cpus and multi-cores architecture.

The specific API to manage buckets are:

Bucket             createBucket(String bucketName);
boolean            existsBucket(String bucketName);
Bucket             getBucketById(int id);
Bucket             getBucketByName(String name);
Collection<Bucket> getBuckets();

Working with indexes

Like any other DBMS, ArcadeDB has indexes. Even if indexes are not used to manage relationships (because ArcadeDB has a native GraphDB engine based on links), indexes are fundamental for a quick lookup of records by one or multiple properties.

By default null values are not indexed, so any query that is looking for null values will not use the index but instead result in a full scan.

ArcadeDB provides automatic and manual indexes:

  • automatic that are updated automatically when you work with records

  • manual are detached from a type and the user is totally responsible to insert and remove entries into and from the index

The Schema API to manage indexes are:

TypeIndex createTypeIndex(SchemaImpl.INDEX_TYPE indexType, boolean unique, String typeName, String[] propertyNames);

TypeIndex createTypeIndex(SchemaImpl.INDEX_TYPE indexType, boolean unique, String typeName, String[] propertyNames, int pageSize);

TypeIndex createTypeIndex(EmbeddedSchema.INDEX_TYPE indexType, boolean unique, String typeName, String[] propertyNames, int pageSize, Index.BuildIndexCallback callback);

TypeIndex createTypeIndex(EmbeddedSchema.INDEX_TYPE indexType, boolean unique, String typeName, String[] propertyNames, int pageSize, LSMTreeIndexAbstract.NULL_STRATEGY nullStrategy, Index.BuildIndexCallback callback);

boolean existsIndex(String indexName);

Index[] getIndexes();

Index   getIndexByName(String indexName);

Where:

  • indexName is the name of the index

  • indexType can be:

    • LSM_TREE, implemented as a Log Structured Merge tree. Best for range scans and ordered iteration.

    • HASH, implemented using extendable hashing. Provides O(1) equality lookups — best for primary keys, JOINs, and edge traversal where ordering is not needed. Does not support range queries.

    • FULL_TEXT, that uses Lucene’s Analyzers for tokenizing, stemming and categorize words inside a text. Internally it’s managed as a LSM_TREE

  • unique tells if the entries in the index must be unique or they can be repeated

  • typeName is the name of the type (document, vertex or edge) where the index must be applied

  • propertyNames is the array of property names to index. In case of more than one property is used, the index is composed. Property names may also reference a document type’s polymorphic properties

  • pageSize is the page size. If not specified, the default of 2MB is used

  • nullStrategy can be:

    • SKIP ignores all null values when creating and reading an index, this is the default.

    • ERROR throws an exception when creating or reading a null value indexed property.

  • callback is a 2 argument Functional interface called when each Document is added to this index. This is most likely only useful for internal use cases. The callback receives the Document just indexed, and a running count of the total number of documents indexed in the bucket.

The API above sometimes returns TypeIndex vs Index. The more general getIndexes and getIndexByName methods return Index because the index might be a bucket index or a type index. The TypeIndex should be useful in most cases, but sometimes it may be useful to work at the bucket level.

In addition to the createTypeIndex methods above, for each overload there is a matching getOrCreateTypeIndex method which returns an existing index rather than throwing an exception if a matching index is found. There are also matching index API creation methods available via the DocumentType for creating an index directly from a document type, rather than against the general Schema object. When using the DocumentType convenience variants typeName is provided automatically.

An index is unique to a set of property names per document type. Any attempt to create a second index on a non-unique set of [typeName, propertyNames] will throw an exception using the createTypeIndex method variants.

A special mention goes for the method createManualIndex() that creates indexes not attached to any type (manual):

Index createManualIndex(SchemaImpl.INDEX_TYPE indexType, boolean unique, String indexName, com.arcadedb.schema.Type[] keyTypes, int pageSize);

While by default indexes are updated automatically when you work with records, in this case, the user is totally responsible to insert and remove entries into and from the index.

Indexing Edges

Like any other document type, indexes may be defined for Edge types as well. If the property name for the index is either @out or @in, the index property will be a LINK type on the adjacently referenced Vertex.

The LINK type represents @RIDs (like #13:222). Usually creating LINK indexes is meant for indexing incoming/outgoing edges in order to prevent multigraphs (i.e. duplicates edges between the same vertex pairs).

Database Configuration

ArcadeDB stores the database configuration into the schema and allows to change things like the timezone, the format of dates and the encoding:

TimeZone getTimeZone();
void     setTimeZone(TimeZone timeZone);
String   getDateFormat();
void     setDateFormat(String dateFormat);
String   getDateTimeFormat();
void     setDateTimeFormat(String dateTimeFormat);
String   getEncoding();