Graph Importer
The GraphImporter is a high-performance, declarative graph importer that uses a two-pass CSR-first architecture to bulk-load graph data from XML, CSV, and JSONL files into ArcadeDB.
It is designed for importing large datasets (millions of vertices and edges) with minimal memory overhead.
The importer is located in the integration module (com.arcadedb.integration.importer.graph.GraphImporter).
How It Works
The import runs in two passes:
-
Pass 1 — Process each data source once: create vertices with full properties, collect graph topology as compressed int arrays
-
Pass 2 — Create all edges from the in-memory topology using GraphBatch, one batch per edge type with bidirectional edges for full IN+OUT traversal
Command-Line Usage
java -cp arcadedb-integration-*.jar com.arcadedb.integration.importer.graph.GraphImporter \
<json-config-file> <database-path> [data-dir]
-
json-config-file— Path to the JSON configuration file (see JSON Configuration below) -
database-path— Path where the database will be created (any existing database at this path is deleted) -
data-dir— Optional base directory for resolving relative file paths in the JSON config (defaults to the JSON file’s parent directory)
Java API
The importer can also be used programmatically via a fluent Builder API:
try (GraphImporter importer = GraphImporter.builder(database)
.vertex("User", new CsvRowSource("users.csv"), v -> {
v.id("Id");
v.intProperty("reputation", "Reputation");
v.property("name", "DisplayName");
})
.vertex("Question", new XmlRowSource("posts.xml"), v -> {
v.id("Id");
v.filter("PostTypeId", "1");
v.property("title", "Title");
v.edgeIn("OwnerUserId", "ASKED", "User");
v.splitEdge("Tags", "TAGGED_WITH", "Tag", "|");
})
.edgeSource("LINKED_TO", new CsvRowSource("links.csv"), e -> {
e.from("PostId", "Question");
e.to("RelatedId", "Question");
e.intProperty("linkType", "LinkTypeId");
})
.limit(10000)
.build()) {
importer.run();
System.out.printf("Vertices: %,d, Edges: %,d%n",
importer.getVertexCount(), importer.getEdgeCount());
}
Or from a JSON configuration file:
String json = new String(Files.readAllBytes(jsonFile.toPath()));
JSONObject config = new JSONObject(json);
GraphImporter.createSchemaFromConfig(database, config);
try (GraphImporter importer = GraphImporter.fromJSON(database, config, dataDir)) {
importer.run();
}
GraphImporter.executePostImportCommands(database, config);
JSON Configuration
The JSON configuration file defines vertex types, edge types, data sources, property mappings, and optional post-import commands.
Vertex Definitions
Each entry in the vertices array defines a vertex type and its data source:
| Key | Required | Description |
|---|---|---|
|
Yes |
ArcadeDB vertex type name (auto-created if it does not exist) |
|
Yes |
Source file path, relative to the data directory. Format is auto-detected from extension: |
|
Yes |
Source attribute used as integer primary key for edge resolution between types |
|
No |
String-based secondary key, used by |
|
No |
Row filter in the format |
|
No |
For XML files: element name to read (defaults to |
|
No |
Maps ArcadeDB property names to source attributes. Supports type prefixes: |
|
No |
Array of edge definitions derived from foreign key attributes in this vertex’s source file |
Edge Definitions (within a vertex)
Each entry in a vertex’s edges array defines how to create edges from foreign key attributes:
| Key | Required | Description |
|---|---|---|
|
Yes |
Source attribute containing the foreign key value |
|
Yes |
ArcadeDB edge type name (auto-created if it does not exist) |
|
Yes |
Target vertex type the foreign key references |
|
No |
|
|
No |
Delimiter for multi-value fields (e.g., |
Edge-Only Sources
The edgeSources array defines edges where both endpoints already exist as vertices. No vertices are created from these sources:
| Key | Required | Description |
|---|---|---|
|
Yes |
ArcadeDB edge type name |
|
Yes |
Source file path |
|
Yes |
Compact format |
|
Yes |
Compact format |
|
No |
Property mappings (same format as vertex properties) |
General Options
| Key | Required | Description |
|---|---|---|
|
No |
Maximum records per source (for testing). Omit or set to 0 for unlimited |
Post-Import Commands
The postImportCommands array defines commands to execute automatically after the graph import completes.
This is useful for creating indexes, analytical views, or running any database command that depends on the imported data being present.
| Key | Required | Description |
|---|---|---|
|
Yes |
Query language to use: |
|
Yes |
The command text to execute |
Commands are executed sequentially in the order they appear. If a command fails, a warning is logged and the remaining commands continue to execute.
If any post-import command triggers an asynchronous Graph Analytical View build, the importer automatically waits (up to 10 minutes) for all views to reach READY status before returning.
Example:
"postImportCommands": [
{
"language": "sql",
"command": "CREATE INDEX ON Question (Id) UNIQUE"
},
{
"language": "sql",
"command": "CREATE GRAPH ANALYTICAL VIEW IF NOT EXISTS myGraph PROPERTIES (`!Body`, `!Text`) UPDATE MODE SYNCHRONOUS"
}
]
Complete Example
Below is a complete JSON configuration for importing a StackOverflow data dump:
{
"vertices": [
{
"type": "Tag", "file": "Tags.xml", "id": "Id", "nameId": "TagName",
"properties": { "Id": "int:Id", "TagName": "TagName", "Count": "int:Count" }
},
{
"type": "User", "file": "Users.xml", "id": "Id",
"properties": {
"Id": "int:Id", "DisplayName": "DisplayName", "Reputation": "int:Reputation",
"CreationDate": "CreationDate", "Views": "int:Views"
}
},
{
"type": "Question", "file": "Posts.xml", "id": "Id", "filter": "PostTypeId=1",
"properties": {
"Id": "int:Id", "Title": "Title", "Body": "Body",
"Score": "int:Score", "ViewCount": "int:ViewCount", "Tags": "Tags"
},
"edges": [
{ "attribute": "OwnerUserId", "edge": "ASKED", "target": "User", "direction": "in" },
{ "attribute": "Tags", "edge": "TAGGED_WITH", "target": "Tag", "split": "|" }
]
},
{
"type": "Answer", "file": "Posts.xml", "id": "Id", "filter": "PostTypeId=2",
"properties": {
"Id": "int:Id", "Body": "Body", "Score": "int:Score"
},
"edges": [
{ "attribute": "OwnerUserId", "edge": "ANSWERED", "target": "User", "direction": "in" },
{ "attribute": "ParentId", "edge": "HAS_ANSWER", "target": "Question", "direction": "in" }
]
}
],
"edgeSources": [
{
"edge": "ACCEPTED_ANSWER", "file": "Posts.xml",
"from": "Id:Question", "to": "AcceptedAnswerId:Answer"
},
{
"edge": "LINKED_TO", "file": "PostLinks.xml",
"from": "PostId:Question", "to": "RelatedPostId:Question",
"properties": { "LinkType": "int:LinkTypeId" }
}
],
"postImportCommands": [
{
"language": "sql",
"command": "CREATE GRAPH ANALYTICAL VIEW IF NOT EXISTS stackoverflow PROPERTIES (`!Body`, `!Text`) UPDATE MODE SYNCHRONOUS"
}
]
}