Neo4j Importer

ArcadeDB is able to import a database exported from Neo4j in JSONL format (one json per line).

To export a Neo4j database follow the instructions in Export in JSON. The resulting file contains one json per line.

Performance

The Neo4j importer uses the high-performance GraphBatch API internally:

  • Vertices are created with pre-allocated edge segments, eliminating lazy allocation during edge creation.

  • Edges are buffered in flat primitive arrays and flushed sorted by source vertex, converting random I/O into sequential I/O. Light edges (no edge record on disk) are used when an edge has no properties.

  • WAL is disabled during import for maximum throughput.

  • ID mapping uses a primitive long[]-based hash map when Neo4j IDs are numeric (the common case with APOC exports), using only ~24 bytes per vertex. For non-numeric IDs, the importer automatically falls back to a standard HashMap. This makes it possible to import databases with hundreds of millions of vertices within a few gigabytes of heap.

Multi-label handling

Neo4j supports multiple labels per node, while in ArcadeDB a node (vertex) must have only one type. The Neo4j importer will simulate multiple labels by creating new types with the following name: <label1>[~<labelN>]*. Example:

{"type":"node","id":"1","labels":["User", "Administrator"],"properties":{"name":"Jim","age":42}}

This vertex will be created in ArcadeDB with type "Administrator~User" (the labels are always sorted alphabetically) that extends both "Administrator" and "User" types.

Neo4jInheritance

In this way you can use the polymorphism of ArcadeDB to retrieve all the nodes of type "User" and the record of User and all its subtypes will be returned.

Importing via SQL

To import a database use the Import Database command from API, Studio or Console. Below you can find an example of importing the Neo4j’s PanamaPapers database by using ArcadeDB Console.

> CREATE DATABASE PanamaPapers
{PanamaPapers}> IMPORT DATABASE file:///temp/panama-papers-neo4j.jsonl
ArcadeDB 26.5.1 - Neo4j Importer
Importing Neo4j database from file 'panama-papers-neo4j.jsonl' to 'databases/PanamaPapers'
- Creation of the schema: types, properties and indexes
- Creation of vertices started
- Creation of vertices completed: created 3 vertices, skipped 1 edges (0 vertices/sec elapsed=0 secs)
- ID mapping mode: numeric (primitive long[])
- Creation of edges started: creating edges between vertices
- Creation of edges completed: created 1 edges, (0 edges/sec elapsed=0 secs)
***************************************************************************************************
Import of Neo4j database completed in 0 secs with 0 errors and 0 warnings.

Importing via command line

The Neo4j importer can also be used directly from the command line:

java -cp lib/* com.arcadedb.integration.importer.Neo4jImporter -i <input-file> -d <database-path> [options]

Options:

Option Default Description

-i <file>

Path to the Neo4j JSONL export file (required)

-d <path>

Path where the ArcadeDB database will be created (required)

-o

false

Overwrite the database if it already exists

-b <size>

10,000

Number of records per transaction batch

-decimalType <type>

DECIMAL

Type for decimal values: FLOAT, DOUBLE, or DECIMAL

-bucketBits <n>

10

Bits allocated for bucket IDs in the internal RID packing. The default supports up to 1,023 buckets, which is sufficient for most databases. Increase this value only if you have a very large number of types and buckets (e.g. -bucketBits 16 supports up to 65,535 buckets)

Example:

java -cp lib/* com.arcadedb.integration.importer.Neo4jImporter \
  -i /data/neo4j-export.jsonl -d /data/arcadedb/mydb -o -decimalType double

Memory considerations

For large imports (hundreds of millions of vertices), the main memory consumer is the ID mapping table that translates Neo4j node IDs to ArcadeDB record IDs. The table size depends on the ID format:

ID format Memory per vertex Example: 100M vertices

Numeric (e.g. "0", "12345")

~24 bytes

~2.2 GB

String (e.g. "node-abc")

~140 bytes

~13 GB

Neo4j APOC exports use numeric IDs by default, so most imports will use the compact primitive map. If the importer encounters a non-numeric ID, it automatically migrates to the string-based map and logs a message:

- Non-numeric Neo4j ID detected, switching to string-based ID mapping
For very large imports, allocate enough heap memory. For example, to import a database with 500M vertices using numeric IDs, you would need approximately 12 GB for the ID mapping table alone, plus memory for ArcadeDB’s internal buffers. A setting of -Xmx24G or more is recommended.