Graph OLAP Engine

Available since ArcadeDB v26.4.1.

The Graph OLAP Engine maintains a read-optimized, columnar representation of your graph alongside the live OLTP data. It uses Compressed Sparse Row (CSR) encoding and flat primitive arrays to deliver 5x–400x speedups on analytical workloads — multi-hop traversals, graph algorithms, and property aggregations — without sacrificing transactional safety.

Why Graph OLAP?

ArcadeDB’s OLTP engine is optimized for point lookups and ACID transactions. Analytical workloads — PageRank, community detection, multi-hop traversals — access millions of edges in tight loops. The row-oriented, pointer-chasing nature of OLTP storage causes cache misses, object overhead, and GC pressure.

The OLAP engine solves this by encoding graph topology as flat int[] arrays and properties as typed columns:

  • Sequential memory access — cache-line friendly, no pointer chasing

  • Zero object allocation — no GC pressure during traversal

  • SIMD-friendly — enables JVM vectorized operations

  • 9x more compact — flat arrays vs. Java object overhead

Graph Analytical View (GAV)

A Graph Analytical View is a named, schema-persisted OLAP snapshot of selected vertex types, edge types, and properties.

GraphAnalyticalView gav = GraphAnalyticalView.builder(database)
    .withName("social")
    .withVertexTypes("Person", "Company")
    .withEdgeTypes("FOLLOWS", "WORKS_AT")
    .withProperties("name", "age", "status")
    .withUpdateMode(UpdateMode.SYNCHRONOUS)
    .build();

Named views are persisted in schema.json and automatically restored on database restart.

SQL

Creating a view

CREATE GRAPH ANALYTICAL VIEW social
  VERTEX TYPES (Person, Company)
  EDGE TYPES (FOLLOWS, WORKS_AT)
  PROPERTIES (name, age, status)
  UPDATE MODE SYNCHRONOUS

All clauses after the view name are optional. A minimal view covering the entire graph:

CREATE GRAPH ANALYTICAL VIEW fullGraph

Use IF NOT EXISTS to avoid errors if the view already exists:

CREATE GRAPH ANALYTICAL VIEW social IF NOT EXISTS
  VERTEX TYPES (Person)
  EDGE TYPES (FOLLOWS)
  UPDATE MODE SYNCHRONOUS

You can also materialize edge properties (e.g., weights):

CREATE GRAPH ANALYTICAL VIEW weighted
  VERTEX TYPES (City)
  EDGE TYPES (ROAD)
  EDGE PROPERTIES (distance, toll)
  UPDATE MODE SYNCHRONOUS
  COMPACTION THRESHOLD 50000

Altering a view

Change the update mode or compaction threshold of an existing view:

ALTER GRAPH ANALYTICAL VIEW social UPDATE MODE ASYNCHRONOUS
ALTER GRAPH ANALYTICAL VIEW social COMPACTION THRESHOLD 20000

Rebuilding a view

Force a full rebuild of the CSR snapshot:

REBUILD GRAPH ANALYTICAL VIEW social

Dropping a view

DROP GRAPH ANALYTICAL VIEW social
DROP GRAPH ANALYTICAL VIEW IF EXISTS social

Listing views

SELECT FROM schema:graphAnalyticalViews

Builder Options

Method Description Default

withName(String)

Named registration + schema persistence

anonymous

withVertexTypes(String…​)

Filter to specific vertex types

all

withEdgeTypes(String…​)

Filter to specific edge types

all

withProperties(String…​)

Materialize specific vertex properties

all

withEdgeProperties(String…​)

Materialize edge properties (e.g., weights)

none

withUpdateMode(UpdateMode)

OFF, SYNCHRONOUS, or ASYNCHRONOUS

OFF

withCompactionThreshold(int)

Rebuild CSR after N accumulated delta edges

10,000

Async Build for Large Graphs

For large graphs, use buildAsync() to avoid blocking the calling thread:

GraphAnalyticalView gav = GraphAnalyticalView.builder(database)
    .withName("large-graph")
    .withUpdateMode(UpdateMode.ASYNCHRONOUS)
    .buildAsync();

// Wait for build completion
boolean ready = gav.awaitReady(30, TimeUnit.SECONDS);

Update Modes

The GAV supports three synchronization modes between OLTP and OLAP:

Mode Behavior Staleness Use Case

OFF

Marks view STALE on commit; requires manual rebuild

Until rebuild

Batch analytics, static snapshots

SYNCHRONOUS

Applies an overlay on each commit

Zero

Real-time analytics, consistent reads

ASYNCHRONOUS

Triggers background rebuild on commit

Brief BUILDING window

Large graphs, tolerable brief inconsistency

In SYNCHRONOUS mode, the engine captures transaction deltas (new/deleted vertices, added/removed edges, property changes) and merges them into an immutable overlay on top of the base CSR. Readers always see a consistent snapshot via an atomic volatile reference swap. When the overlay accumulates too many changes (configurable threshold, default 10,000 edges), a background compaction rebuilds the full CSR.

How CSR Works

The graph topology is stored as two pairs of arrays (forward for outgoing edges, backward for incoming):

Forward CSR (outgoing edges):
  offsets:   [0, 3, 5, 8, ...]     -- one entry per vertex + sentinel
  neighbors: [1, 5, 7, 2, 6, ...]  -- dense neighbor IDs, contiguous per source

  Outgoing neighbors of vertex v = neighbors[offsets[v] .. offsets[v+1])
  Out-degree of vertex v         = offsets[v+1] - offsets[v]   -- O(1)

This layout enables sequential memory access (cache-line friendly) and O(1) degree lookups.

Columnar Property Storage

Properties are stored as typed flat arrays — int[], long[], double[], or dictionary-encoded int[] for strings. Each column has a compact null bitmap (1 bit per vertex). Dictionary encoding maps unique string values to integer codes, achieving near-100% compression for low-cardinality fields.

Memory Usage

The OLAP representation is significantly more compact than the OLTP equivalent:

  • CSR topology: ~8 bytes per edge (bidirectional)

  • Node ID mapping: ~8 bytes per vertex

  • Columnar properties: 4–8 bytes per vertex per column

  • Null bitmaps: 1 bit per vertex per column

Example: for a graph with 500K vertices and 8M edges, the GAV uses 134.6 MB compared to an estimated ~1.2 GB for the OLTP representation — 9.3x more compact.

long bytes = gav.getMemoryUsageBytes();

Graph Algorithms

The module includes parallelized graph algorithms that operate directly on CSR arrays with zero GC pressure:

Algorithm Description

PageRank

Pull-based, parallel, configurable damping factor and iterations

Connected Components

Min-label propagation for weakly connected components

BFS

Breadth-first search with distance arrays

SSSP (Dijkstra)

Single-source shortest path for weighted graphs

Label Propagation

Community detection

Triangle Counting

Count 3-cliques in the graph

Local Clustering Coefficient

Per-vertex clustering coefficients

GraphAlgorithms algos = new GraphAlgorithms();

// PageRank (20 iterations, damping 0.85)
double[] ranks = algos.pageRank(gav, 20, 0.85);

// Connected Components
int[] components = algos.connectedComponents(gav);

// BFS from a source vertex
int[] distances = algos.bfs(gav, sourceNodeId);

Query Planner Integration

The Cypher query planner automatically detects ready GAVs and substitutes OLTP traversal operators with CSR-based operators when:

  • A named GAV is registered and in READY state

  • The GAV covers the required vertex and edge types

  • The query does not return edge variables as first-class records (edges in CSR have no RID; edge properties are fully supported)

No query changes are needed — the optimizer transparently accelerates matching traversal patterns.

Lifecycle

// Check status
if (gav.isReady()) { /* safe to query */ }

// Status values: NOT_BUILT, BUILDING, READY, STALE
Status status = gav.getStatus();

// Drop (removes from registry + schema)
gav.drop();

// Shutdown (release resources, schema definition persists)
gav.shutdown();

Benchmark Results

On a graph with 500K vertices and ~8M edges:

Benchmark OLTP OLAP Speedup

1-hop count

6.9 µs

1.2 µs

5.7x

2-hop

101.4 µs

5.1 µs

19.8x

3-hop

1,037 µs

56.4 µs

18.4x

5-hop

194,046 µs

5,141 µs

37.7x

Shortest Path

394 ms/pair

7.5 ms/pair

52.8x

PageRank (20 iter)

124,563 ms

316 ms

394.2x

Connected Components

5,591 ms

197 ms

28.4x

Label Propagation

62,619 ms

645 ms

97.1x

Limitations

  • CSR uses int[] arrays — maximum ~2.1 billion vertices per bucket and ~2.1 billion edges per direction

  • Edges in CSR do not carry their own RID; the Cypher query planner falls back to OLTP only when the query returns an edge variable as a first-class record (e.g., RETURN r). Edge properties are fully supported via withEdgeProperties()

  • Dictionary encoding applies only to string properties

  • Initial build requires a full scan of selected vertex/edge types