High Availability
ArcadeDB supports a High Availability (HA) mode where multiple servers share the same databases via Raft-based replication. All servers of a cluster serve the same set of databases.
| Starting with v26.4.1, HA is powered by Apache Ratis, a production-grade implementation of the Raft consensus protocol. The old custom replication protocol has been removed. For the underlying concepts, see High Availability Concepts. |
Quick Start
To start an ArcadeDB server with HA enabled, at minimum you need to:
-
Set
arcadedb.ha.enabledtotrue. -
Define the list of peers in the cluster via
arcadedb.ha.serverList(if you are deploying on Kubernetes, see Kubernetes). The value is a comma-separated list of entries in the format[name@]host:raftPort:httpPort[:priority]. The optionalname@prefix is described in Named Peers. -
Optionally set the local server name via
arcadedb.server.name. Each node must have a unique name. If not specified, the default isArcadeDB_0.
Example starting a 3-node cluster on a single host (different ports):
$ bin/server.sh -Darcadedb.ha.enabled=true \
-Darcadedb.server.name=node1 \
-Darcadedb.ha.serverList=localhost:2434:2480,localhost:2435:2481,localhost:2436:2482
The Raft gRPC port is 2434 by default and is configurable via arcadedb.ha.raftPort.
The HTTP port in each entry is required so that replicas can forward non-idempotent requests to the leader over HTTP.
The cluster name is arcadedb by default; set arcadedb.ha.clusterName=<name> to run multiple independent clusters on the same network.
|
Named Peers (from v26.5.1)
By default, each node identifies its slot in arcadedb.ha.serverList from the numeric suffix of arcadedb.server.name (for example, ArcadeDB_0 maps to the first entry, ArcadeDB_1 to the second, and so on).
Display names shown in logs and Studio are then synthesized from the local node’s prefix.
This positional convention works well for Kubernetes StatefulSets but is awkward when nodes have human-readable names such as frankfurt, london, nyc.
You can add an optional name@ prefix to any entry in arcadedb.ha.serverList to give each peer a stable, human-readable name:
$ bin/server.sh -Darcadedb.ha.enabled=true \
-Darcadedb.server.name=frankfurt \
[email protected]:2434:2480,[email protected]:2434:2480,[email protected]:2434:2480
When peer names are configured, each node identifies its slot by matching arcadedb.server.name against the configured peer names first; if no match is found, it falls back to the legacy prefix_N / prefix-N suffix resolution.
This means:
-
Server names no longer require a numeric suffix when peer names are used.
-
Display names shown in logs and Studio reflect the configured peer name (e.g.
frankfurt (10.0.0.1:2480)instead ofArcadeDB_0 (10.0.0.1:2480)). -
Mixed clusters work: entries with
name@get their explicit name; entries without it fall back to the existing positional synthesis. -
Peer names must be unique within the cluster.
The Raft peer ID is still derived from the address (e.g. 10.0.0.1_2434), so changing or omitting peer names does not affect Raft identity stability.
Architecture
ArcadeDB uses a leader/replica model with Raft consensus. At any time one server holds leadership and accepts writes; the others are replicas that serve reads and stand by for failover.
Each server persists its own Raft log segments under <rootPath>/raft-storage/.
The log is used for recovery after restart and to replicate state to peers that fall behind.
Any read (query) can execute on any server in the cluster. All writes must go through the leader: a replica that receives a write transparently forwards it to the current leader via HTTP.
Internally, every committed write follows a replicate-first, commit-after three-phase commit:
-
Phase 1 (read lock): capture the WAL pages produced by the transaction.
-
Phase 2 (no lock): append the payload to the Raft log and wait for quorum acknowledgment.
-
Phase 3 (read lock): apply the pages to the local database and return to the client.
If Phase 2 times out or fails, Phase 3 never runs: no local writes, no divergence.
Multiple concurrent transactions are batched into fewer Raft round-trips via group commit (HA_RAFT_GROUP_COMMIT_BATCH_SIZE).
Write Quorum
A quorum is the number of peers that must acknowledge a write before it commits.
Configure it via arcadedb.ha.quorum:
-
majority(default) — a majority of peers must acknowledge (standard Raft). -
all— every configured peer must acknowledge.
If the configured quorum is not met within arcadedb.ha.quorumTimeout milliseconds, the transaction is rolled back and an error is returned to the client.
|
Starting with v26.4.1, the legacy quorum values |
Read Consistency (from v26.4.1)
When a read runs on a replica, ArcadeDB offers three consistency levels, configurable per-server via arcadedb.ha.readConsistency:
| Level | Behavior |
|---|---|
|
Read locally without waiting. Fastest, but may return data that was committed on the leader after the replica’s last apply. |
|
The replica waits until the Raft log index corresponding to the client’s most recent write has been applied locally before serving the read. |
|
The replica issues a Raft |
Clients communicate the consistency contract through three HTTP headers:
| Header | Direction | Purpose |
|---|---|---|
|
request |
Overrides the server default for this request. Accepts |
|
request |
Client-supplied bookmark: the Raft commit index the replica must have applied before serving the read. Used to implement read-your-writes across connections. |
|
response |
Current last-applied commit index, echoed on every response. Clients capture this value and send it back as |
Older clients that sent X-ArcadeDB-Commit-Index on requests are still accepted as a backward-compatible fallback.
|
Load Shedding and Backpressure (from v26.4.1)
The Raft group-commit queue is bounded (arcadedb.ha.groupCommitQueueSize, default 10000 pending transactions).
When a write arrives but the queue is full, the server waits up to arcadedb.ha.groupCommitOfferTimeout ms (default 100) for a slot, and if still full throws ReplicationQueueFullException — a NeedRetryException that clients automatically retry with backoff.
This protects the server from OOM under sustained write overload and lets clients back off gracefully.
Witness / Read-Scale Nodes (from v26.4.1)
Set arcadedb.ha.serverRole=replica to pin a node as a permanent replica.
The node’s Raft priority is set to 0 at startup so it is never elected leader, while still receiving all writes and serving reads.
Useful for read-scale deployments or for cross-DC witnesses that should not take over as leader.
Automatic Failover
If the leader becomes unreachable, replicas start a new Raft election. A replica with an up-to-date log is elected as the new leader and the cluster resumes serving writes. Pre-vote prevents partitioned nodes from triggering disruptive elections.
Common causes of leader unavailability include:
-
The ArcadeDB server process has been terminated.
-
The physical or virtual host has been shut down or rebooted.
-
Network issues prevent the leader from reaching a majority of peers.
When a replica rejoins after being offline, it catches up via the Raft log automatically; if it has fallen behind past the log purge boundary, the leader streams a full database snapshot over HTTP and the replica installs it atomically.
Cluster Management REST API (from v26.4.1)
Cluster membership and leadership can be changed at runtime through dedicated REST endpoints.
All endpoints require authentication as the root user.
| Method | Path | Description |
|---|---|---|
GET |
|
Return the current cluster status: current leader, peers and their roles, current term, commit and applied indices, and per-follower replication lag. |
POST |
|
Add a peer to the running cluster. Body: |
DELETE |
|
Remove a peer from the cluster. |
POST |
|
Transfer leadership. Body: |
POST |
|
Make the current leader step down. The cluster elects a new leader. |
POST |
|
Gracefully remove this server from the cluster. If the local server is the leader, leadership is transferred first. This endpoint is used by the Kubernetes |
POST |
|
Compare component file checksums for the given database across all peers. Useful to confirm that all nodes have converged. |
GET |
|
Leader-only: stream a ZIP of the database for follower catch-up. Requires the cluster token and is used internally by snapshot recovery. |
Example: add a new peer at runtime.
$ curl -u root:<password> \
-X POST http://leader:2480/api/v1/cluster/peer \
-H 'Content-Type: application/json' \
-d '{"peerId":"node4","address":"10.0.0.4:2434:2480"}'
Adding a peer with a human-readable name (from v26.5.1):
$ curl -u root:<password> \
-X POST http://leader:2480/api/v1/cluster/peer \
-H 'Content-Type: application/json' \
-d '{"peerId":"10.0.0.4_2434","address":"10.0.0.4:2434:2480","name":"frankfurt"}'
Studio Cluster Dashboard (from v26.4.1)
ArcadeDB Studio exposes a Cluster tab (visible when HA is enabled) that displays the current leader, peer roles, Raft term and commit index, per-follower replication lag, and provides buttons for leadership transfer, peer add/remove, and database verification. See Studio.
Security
- Cluster token
-
Inter-node HTTP forwarding (replica → leader proxy, snapshot downloads) is authenticated with a shared cluster token sent as the
X-ArcadeDB-Cluster-TokenHTTP header. Ifarcadedb.ha.clusterTokenis not set explicitly, the token is auto-generated on first startup and persisted underraft-storage/; all subsequent nodes must start with the same token (or withHA_CLUSTER_TOKENunset so they can read it from the same storage). Set it explicitly for production deployments to coordinate the token across nodes. - Peer allowlist
-
Inbound Raft gRPC connections are filtered against the DNS-resolved hosts in
arcadedb.ha.serverList; connections from other addresses are rejected. This closes the "any host that knows the port can inject log entries" vector. It is not a substitute for mTLS: deploy HA behind a private network, NetworkPolicy, or service mesh on untrusted networks.
Kubernetes
ArcadeDB supports the standard StatefulSet + Headless Service pattern for Raft deployments (similar to etcd, Apache Ozone, CockroachDB).
The official Helm chart pre-computes the full server list from replicaCount and injects it via environment variables.
Minimal configuration:
arcadedb.ha.enabled=true
arcadedb.ha.k8s=true
arcadedb.ha.k8sSuffix=.arcadedb.default.svc.cluster.local
arcadedb.ha.serverList=arcadedb-0.arcadedb.default.svc.cluster.local:2434:2480,arcadedb-1.arcadedb.default.svc.cluster.local:2434:2480,arcadedb-2.arcadedb.default.svc.cluster.local:2434:2480
- Auto-join on scale-up (from v26.4.1)
-
When
arcadedb.ha.k8s=trueand a new pod starts without an existing Raft storage directory, the server automatically joins the existing cluster via the RaftSetConfiguration(ADD)admin API. This enables zero-downtime scale-up:kubectl scale statefulset arcadedb --replicas=5adds new pods that join the existing cluster without restarting the existing peers. - Auto-leave on scale-down
-
The Helm chart installs a
preStophook that callsPOST /api/v1/cluster/leaveso the terminating pod cleanly transfers leadership (if it holds it) and is removed from the Raft group before shutdown.
Troubleshooting
Performance: insertion is slow
ArcadeDB uses an optimistic concurrency model: if two threads try to update the same page, the first wins, the second throws a ConcurrentModificationException and the client retries (configurable number of times).
In HA mode the retry window is wider because file locks are held during the Raft round-trip.
If you are inserting many records in parallel, allocate one bucket per thread to eliminate contention.
Example for the vertex type User:
ALTER TYPE User BucketSelectionStrategy `thread`
With enough buckets, parallel insertions avoid page contention entirely and do not hit the retry path.
HA Settings
The following settings control HA behavior. A complete list of all HA parameters is available in Server Settings.
| Setting | Description | Default Value |
|---|---|---|
|
Enables HA for this server |
false |
|
Cluster name. Useful when running multiple clusters in the same network |
arcadedb |
|
Comma-separated list of peers in the format |
(empty) |
|
Default Raft gRPC port used when an entry in |
2434 |
|
Write quorum: |
majority |
|
Timeout in ms waiting for the quorum acknowledgment |
10000 |
|
Default read consistency for follower reads: |
read_your_writes |
|
Minimum and maximum Raft election timeout in ms. Increase for WAN clusters |
2000 / 5000 |
|
Node role: |
any |
|
Number of Raft log entries after which the leader takes a snapshot |
100000 |
|
Bounded queue size and offer timeout (ms) for Raft group-commit backpressure |
10000 / 100 |
|
If true, the Raft storage directory is preserved across restarts, enabling rejoin without a full snapshot resync |
false |
|
If true, the JVM exits after exhausting step-down retries on a phase-2 replication failure. Default false keeps the server up and logs CRITICAL |
false |
|
Shared secret for inter-node authentication. If empty, auto-generated at first startup and persisted under |
(empty) |
|
Verbose HA logging: 0=off, 1=basic, 2=detailed, 3=trace |
0 |
|
The server is running inside Kubernetes (enables auto-join) |
false |
|
DNS suffix used to reach the other servers in Kubernetes, e.g. |
(empty) |