Node Embedding
Node embedding procedures map each vertex to a dense real-valued vector that encodes its structural position in the graph. The resulting vectors can be used directly as features for machine-learning tasks (classification, clustering, link prediction) or for similarity search using ArcadeDB’s vector index.
All embedding procedures yield a List<Float> that can be stored as a vertex property or forwarded to a downstream vector index.
algo.fastrp
| Property | Value |
|---|---|
Procedure |
|
Category |
Node Embedding |
Complexity |
CPU |
Min Args |
0 |
Max Args |
1 |
Syntax
CALL algo.fastrp([config]) YIELD node, embedding
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
|
Map |
|
Configuration map (see below) |
Config Parameters
| Key | Type | Default | Description |
|---|---|---|---|
|
Integer |
|
Embedding vector size |
|
Integer |
|
Number of propagation rounds |
|
Float |
|
Degree-normalisation exponent α: weight of neighbour j contributing to node i is proportional to |
|
Float |
|
Weight [0,1] given to the node’s own previous embedding vs. the aggregated neighbour embedding |
|
String |
all types |
Comma-separated edge type names |
|
String |
|
Edge traversal direction: |
|
Long |
|
Random seed; -1 = random |
Yield Fields
| Field | Type | Description |
|---|---|---|
|
Vertex |
The vertex |
|
List<Float> |
L2-normalised embedding vector of length |
Description
FastRP (Fast Random Projection, Chen et al. 2019) generates dense node embeddings without any training phase. Each node is initialised with a sparse ternary random vector (values ±√3 with probability 1/6 each, 0 with probability 2/3 — the optimal sparse random projection of Achlioptas 2003). The embedding is then iteratively refined by computing a weighted average of neighbour embeddings, followed by L2 normalisation. The resulting vectors capture multi-hop structural proximity and run orders of magnitude faster than walk-based methods.
Use Cases
-
Fast baseline embeddings for downstream ML pipelines
-
Similarity search via vector index
-
Graph-aware feature generation without labelled data
Example
CALL algo.fastrp({dimensions: 64, iterations: 3, seed: 42})
YIELD node, embedding
RETURN node.name AS name, embedding
References
algo.node2vec
| Property | Value |
|---|---|
Procedure |
|
Category |
Node Embedding |
Complexity |
CPU |
Min Args |
0 |
Max Args |
1 |
Syntax
CALL algo.node2vec([config]) YIELD node, embedding
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
|
Map |
|
Configuration map (see below) |
Config Parameters
| Key | Type | Default | Description |
|---|---|---|---|
|
Integer |
|
Embedding vector size |
|
Integer |
|
Number of steps per random walk |
|
Integer |
|
Random walks generated per node |
|
Integer |
|
Training epochs over all walks |
|
Integer |
|
Skip-gram context window radius |
|
Integer |
|
Negative samples per positive (centre, context) pair |
|
Float |
|
Initial SGD learning rate (linearly decayed per epoch) |
|
Float |
|
Return parameter: high p → less likely to return to previous node |
|
Float |
|
In-out parameter: low q → DFS-like exploration; high q → BFS-like |
|
String |
all types |
Comma-separated edge type names |
|
String |
|
Edge traversal direction |
|
Long |
|
Random seed; -1 = random |
Yield Fields
| Field | Type | Description |
|---|---|---|
|
Vertex |
The vertex |
|
List<Float> |
L2-normalised embedding vector of length |
Description
Node2Vec (Grover & Leskovec, 2016) learns node embeddings using biased second-order random walks combined with a Skip-gram model trained via negative sampling. The walk bias is controlled by two parameters:
-
p (return parameter): low p makes the walk more likely to revisit the previous node (BFS-like neighbourhood exploration)
-
q (in-out parameter): low q makes the walk explore outward (DFS-like), revealing structural roles; high q stays close to the source (community-based embeddings)
The Skip-gram model is trained with SGD and negative sampling, using a linearly decaying learning rate per epoch.
Use Cases
-
Community-aware embeddings (high q)
-
Structural-role embeddings (low q)
-
Link prediction feature generation
-
Node classification pre-training
Example
CALL algo.node2vec({embeddingDimension: 64, p: 1.0, q: 0.5, walkLength: 30, seed: 42})
YIELD node, embedding
RETURN node.name AS name, embedding
References
algo.graphsage
| Property | Value |
|---|---|
Procedure |
|
Category |
Node Embedding |
Complexity |
CPU |
Min Args |
0 |
Max Args |
1 |
Syntax
CALL algo.graphsage([config]) YIELD node, embedding
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
|
Map |
|
Configuration map (see below) |
Config Parameters
| Key | Type | Default | Description |
|---|---|---|---|
|
Integer |
|
Output embedding size per layer |
|
Integer |
|
Number of aggregation layers (receptive field = layers hops) |
|
String |
all types |
Comma-separated edge type names |
|
String |
|
Edge traversal direction |
|
Long |
|
Random seed; -1 = random |
Yield Fields
| Field | Type | Description |
|---|---|---|
|
Vertex |
The vertex |
|
List<Float> |
L2-normalised embedding vector of length |
Description
GraphSAGE (Hamilton et al., 2017) — inductive neighbourhood aggregation. This implementation is unsupervised: no training labels are needed. Node features are initialised from structural properties (log-normalised degree + Gaussian noise) and then propagated through layers rounds of mean aggregation, each followed by a random linear projection (Xavier initialisation) and ReLU activation. The resulting embeddings capture multi-hop structural similarity.
Because the projection matrices are randomly initialised rather than trained, the embeddings are structurally consistent (nodes with similar neighbourhoods receive similar embeddings) and can be used directly as ML features or refined with downstream fine-tuning.
Use Cases
-
Inductive embeddings for graphs that grow over time
-
Feature generation for node classification / link prediction
-
Similarity search and clustering on graph data
Example
CALL algo.graphsage({embeddingDimension: 32, layers: 2, seed: 7})
YIELD node, embedding
RETURN node.name AS name, embedding
References
algo.hashgnn
| Property | Value |
|---|---|
Procedure |
|
Category |
Node Embedding |
Complexity |
CPU |
Min Args |
0 |
Max Args |
1 |
Syntax
CALL algo.hashgnn([config]) YIELD node, embedding
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
|
Map |
|
Configuration map (see below) |
Config Parameters
| Key | Type | Default | Description |
|---|---|---|---|
|
Integer |
|
Output embedding size (number of MinHash functions) |
|
Integer |
|
Message-passing rounds (receptive field = iterations hops) |
|
String |
all types |
Comma-separated edge type names |
|
String |
|
Edge traversal direction |
|
Long |
|
Random seed; -1 = random |
Yield Fields
| Field | Type | Description |
|---|---|---|
|
Vertex |
The vertex |
|
List<Float> |
L2-normalised MinHash embedding vector of length |
Description
HashGNN is a training-free graph neural network that uses locality-sensitive hashing for neighbourhood aggregation. Each node is initialised with a sparse random binary feature vector (≈12.5% density) derived from its structural identity. For each propagation round, each node’s feature set is expanded by OR-combining neighbour feature sets, then reduced to a fixed-size MinHash sketch using random linear hash functions h_d(x) = (a·x + b) mod F. The final embedding is the L2-normalised MinHash signature, providing probabilistic Jaccard similarity guarantees: two nodes with similar MinHash signatures have similar neighbourhood feature sets.
HashGNN is extremely fast (no matrix multiplication, no gradient computation) and naturally handles graphs without node features.
Use Cases
-
Ultra-fast structural embeddings as features for downstream ML
-
Graph-level similarity via pooled embeddings
-
Anomaly detection (unusual neighbourhood structure)
Example
CALL algo.hashgnn({embeddingDimension: 64, iterations: 3, seed: 42})
YIELD node, embedding
RETURN node.name AS name, embedding
References