What is ArcadeDB?
ArcadeDB is a multi-model database management system: graph, document, key/value, time-series, vector, and geospatial data live on the same storage and inside the same transaction. One process, one schema, one query language of your choice — instead of a separate database for each shape of data.
If you’ve ever stitched together PostgreSQL, MongoDB, Neo4j, Elasticsearch, and a vector database to back a single application, this page explains why ArcadeDB collapses that stack into one engine, and why we believe the graph model (more on that below) is a better default than people realise.
TL;DR
-
Six data models — graph, document, key/value, full-text, vector, time-series — share the same storage and transaction.
-
Six query languages — SQL, Cypher, Gremlin, GraphQL, MongoDB QL, Redis — read and write the same records.
-
Native graph: edges are physical pointers between records, every traversal hop is O(1).
-
Embeddable in any JVM (16 MB heap minimum) or run as a server with Docker / Kubernetes.
-
Apache 2.0 — free for commercial use, no copyleft, no Enterprise edition gate.
The problem with running many databases
Most teams that need to combine relationships, JSON, full-text search and vector similarity end up with polyglot persistence: PostgreSQL for transactional data, Neo4j for the graph, Elasticsearch for search, MongoDB for flexible documents, Pinecone or Qdrant for embeddings. Each system brings:
-
a separate deployment, backup strategy, security model, and failure mode;
-
an ETL pipeline (or change-data-capture queue) to keep data in sync — every sync introduces lag, complexity, and bugs;
-
a learning curve per database for every developer on the team;
-
a multiplier on infrastructure cost and operational headcount.
A single user request often turns into a chain of sequential calls across these systems: query Postgres, then Neo4j, then Elasticsearch, then stitch the results together in the application layer. The total response time is the sum of every hop, plus the cost of serializing data three times.
The animation runs faster than reality, but the proportions hold: by the time the polyglot scenario finishes one round of three sequential calls, ArcadeDB has completed several full transactions. The numbers below match the animation:
| Polyglot persistence (sequential) | Database | Latency |
|---|---|---|
Your app → PostgreSQL |
Relational |
~15 ms |
then … → MongoDB |
Document |
~20 ms |
then … → Neo4j |
Graph |
~12 ms |
Total: ~47 ms+ (sum of calls + serialization overhead) |
||
| ArcadeDB (single call) | Engine | Latency |
|---|---|---|
Your app → ArcadeDB (graph + document + search in one query) |
Multi-model |
~5 ms |
Total: ~5 ms (one call, zero chaining, no serialization) |
||
One engine, every model
ArcadeDB collapses the polyglot stack into a single engine. A single ACID transaction can:
-
create a JSON document,
-
connect it as a vertex in a graph,
-
index its embedding for vector similarity,
-
and append a time-series sample for it,
— atomically. There is no replication lag between models, because there is no replication: the data exists once, on one storage layer.
The consequences run deeper than just lower latency:
-
Lower latency — one network round-trip, no waiting on the slowest database in the chain.
-
One operational model — one backup, one HA topology, one monitoring dashboard, one upgrade path.
-
One learning curve — your team learns ArcadeDB once, not five products.
-
Real cross-model queries — a SQL statement can traverse a graph, filter on document fields, and rank by vector similarity in a single pass:
SELECT FROM (
MATCH {type: User, as: u} -Follows-> {type: User, as: f}
RETURN f
) WHERE vectorCosineSimilarity(f.embedding, ?) > 0.8
The graph traversal and the vector comparison run against the same records in the same transaction. No cross-database join, no eventual consistency, no glue code.
The graph model is our core — and it’s native
ArcadeDB is a native graph database. That phrase carries weight, so let’s unpack it.
In a relational database, a relationship between two rows is a foreign key — a value in one table that matches a value in another. To follow it, the database engine must look the other side up, typically via an index. The cost grows with the size of the joined tables: a JOIN on millions of rows forces the planner to choose between hash join, merge join, or nested loop, and none of those are free. As your data grows, your queries slow down.
In a native graph, edges are physical links between records. Following an edge dereferences a pointer; the database does not scan, does not probe an index, does not build a hash table. Every traversal hop is O(1) regardless of database size. This is called index-free adjacency, and it changes the kind of question you can ask:
-
"Friends of friends of friends of Alice" — three pointer hops, milliseconds, no matter how many users you have.
-
"Every person within 4 degrees of someone who bought product X in the last hour" — a traversal, not a query plan with three nested joins.
-
"Shortest path between two arbitrary entities in a 100-million-vertex graph" — feasible.
In ArcadeDB, edges are first-class records: a Follows edge can carry properties like since, weight, label. Lightweight edges exist for relationships with no properties (zero per-edge storage). Both are stored as physical links — there are no join tables, no adjacency lists bolted onto a relational schema, no LINKBAG-style indirection.
This is also why we recommend the graph model as the default even for "non-graph" applications. Anything you can model as parent/child, owner/owned, follower/followee, before/after, or "belongs to" is naturally a graph. Modelling it as a graph means a JOIN becomes a single-pointer dereference, and your query keeps the same speed when the table grows from a thousand rows to a billion.
The other models you get
Documents
Documents are JSON-like records with nested properties, lists, and maps. Document types can be schema-full (every property declared with a type and constraints), schema-less (no validation, free-form), or schema-hybrid (some properties enforced, others free-form). Types support inheritance — a Customer type can extend Person and inherit its properties, indexes, and constraints.
Key-value
Every record has a Record ID (RID); fetching by RID is O(1). Buckets can also act as dedicated key/value namespaces — useful for caching, session storage, and configuration without a type schema.
Full-text search
Built-in Lucene engine with fuzzy matching, integrated into the query languages. No external Elasticsearch deployment required for typical search workloads.
Time-series
Columnar storage with Gorilla XOR compression for floats and Delta-of-Delta for timestamps — 0.4 to 1.4 bytes per sample. Ingestion via the InfluxDB Line Protocol (works with Telegraf, Grafana Agent, etc.); query with PromQL or SQL analytical functions like ts.timeBucket, ts.rate, ts.percentile.
Vector
JVector indexes (HNSW or Vamana / DiskANN) with COSINE, DOT_PRODUCT, and EUCLIDEAN similarity, and INT8 / BINARY quantisation for memory savings. Vector indexes participate in ACID transactions and integrate directly into SQL through vectorNeighbors() and vectorCosineSimilarity().
Pick the query language your team already knows
ArcadeDB speaks the languages your team is comfortable with. They all read and write the same records on the same storage:
-
SQL — extended with graph traversal, full-text, vector, and time-series functions
-
Cypher (openCypher) — drop-in replacement for Neo4j Cypher, 97.8% TCK compliant
-
Gremlin — Apache TinkerPop graph traversal
-
GraphQL — schema-driven queries over HTTP
-
MongoDB QL — document-oriented queries via the MongoDB protocol
-
Redis commands — key/value access via the Redis wire protocol
A record created through SQL is immediately visible to a Gremlin traversal, and an edge created through Cypher can be queried through GraphQL.
Embedded or server, your choice
ArcadeDB runs in two modes:
-
Embedded — link the JAR into a JVM application and run the database in-process. No separate server, no network hop, no IPC overhead. Footprint starts at ~16 MB of heap. Apache 2.0; no Enterprise edition required.
-
Server — run as a process and connect from any language over HTTP/REST, the PostgreSQL wire protocol, the Neo4j Bolt protocol, MongoDB drivers, Redis clients, the gRPC API, or MCP for AI agents.
Many ArcadeDB deployments mix both: an embedded engine inside a hot-path service plus a server cluster for analytics and external clients, all sharing the same database files.
Why it’s so fast
ArcadeDB is written in Low-Level Java — Java 21+, but without high-level abstractions on the hot path. Few objects are allocated at run time, so the garbage collector rarely needs to act. The kernel uses cache-line-aware data layouts, lock-free reads where possible, and other mechanical-sympathy techniques to make the JVM behave more like a hand-tuned native binary.
The storage engine is an LSM-tree with bucket partitioning, page caching, and a write-ahead log. On the LDBC Graphalytics benchmark, ArcadeDB outperforms popular graph databases by 5×–80× on PageRank, BFS, and weakly-connected-components workloads.
Apache 2.0, free for commercial use
ArcadeDB Community Edition is released under the Apache 2.0 license — free for any purpose, commercial use included, no copyleft. You can embed it in proprietary products without publishing your source.
When code is public, anyone can scrutinize, test, report, and resolve issues. Open source moves faster than the proprietary world, and the trust comes from being able to read the code rather than a marketing page.
For mission-critical production deployments, professional support is available with guaranteed response times and SLA coverage. See the SLA page for service-level details.
Where to next
-
Run ArcadeDB — install with Docker, native binaries, or Kubernetes
-
Multi-model architecture — see how all six models share one engine
-
10-minute Java tutorial — embedded mode quickstart
-
Python quickstart — connect via PostgreSQL or HTTP
-
ArcadeDB Academy — free, self-paced courses with certification