Node Embedding

Node embedding procedures map each vertex to a dense real-valued vector that encodes its structural position in the graph. The resulting vectors can be used directly as features for machine-learning tasks (classification, clustering, link prediction) or for similarity search using ArcadeDB’s vector index.

All embedding procedures yield a List<Float> that can be stored as a vertex property or forwarded to a downstream vector index.

algo.fastrp

Property Value

Procedure

algo.fastrp

Category

Node Embedding

Complexity

CPU

Min Args

0

Max Args

1

Syntax

CALL algo.fastrp([config]) YIELD node, embedding

Parameters

Name Type Default Description

config

Map

{}

Configuration map (see below)

Config Parameters

Key Type Default Description

dimensions

Integer

128

Embedding vector size

iterations

Integer

4

Number of propagation rounds

normalization

Float

0.0

Degree-normalisation exponent α: weight of neighbour j contributing to node i is proportional to deg(i)^{-α} × deg(j)^{-α}; 0 = no normalisation, 1 = GCN-style

selfInfluence

Float

0.0

Weight [0,1] given to the node’s own previous embedding vs. the aggregated neighbour embedding

relTypes

String

all types

Comma-separated edge type names

direction

String

BOTH

Edge traversal direction: IN, OUT, or BOTH

seed

Long

-1

Random seed; -1 = random

Yield Fields

Field Type Description

node

Vertex

The vertex

embedding

List<Float>

L2-normalised embedding vector of length dimensions

Description

FastRP (Fast Random Projection, Chen et al. 2019) generates dense node embeddings without any training phase. Each node is initialised with a sparse ternary random vector (values ±√3 with probability 1/6 each, 0 with probability 2/3 — the optimal sparse random projection of Achlioptas 2003). The embedding is then iteratively refined by computing a weighted average of neighbour embeddings, followed by L2 normalisation. The resulting vectors capture multi-hop structural proximity and run orders of magnitude faster than walk-based methods.

Use Cases

  • Fast baseline embeddings for downstream ML pipelines

  • Similarity search via vector index

  • Graph-aware feature generation without labelled data

Example

CALL algo.fastrp({dimensions: 64, iterations: 3, seed: 42})
YIELD node, embedding
RETURN node.name AS name, embedding

References


algo.node2vec

Property Value

Procedure

algo.node2vec

Category

Node Embedding

Complexity

CPU

Min Args

0

Max Args

1

Syntax

CALL algo.node2vec([config]) YIELD node, embedding

Parameters

Name Type Default Description

config

Map

{}

Configuration map (see below)

Config Parameters

Key Type Default Description

embeddingDimension

Integer

128

Embedding vector size

walkLength

Integer

80

Number of steps per random walk

walksPerNode

Integer

10

Random walks generated per node

iterations

Integer

1

Training epochs over all walks

windowSize

Integer

10

Skip-gram context window radius

negSamples

Integer

5

Negative samples per positive (centre, context) pair

learningRate

Float

0.025

Initial SGD learning rate (linearly decayed per epoch)

p

Float

1.0

Return parameter: high p → less likely to return to previous node

q

Float

1.0

In-out parameter: low q → DFS-like exploration; high q → BFS-like

relTypes

String

all types

Comma-separated edge type names

direction

String

BOTH

Edge traversal direction

seed

Long

-1

Random seed; -1 = random

Yield Fields

Field Type Description

node

Vertex

The vertex

embedding

List<Float>

L2-normalised embedding vector of length embeddingDimension

Description

Node2Vec (Grover & Leskovec, 2016) learns node embeddings using biased second-order random walks combined with a Skip-gram model trained via negative sampling. The walk bias is controlled by two parameters:

  • p (return parameter): low p makes the walk more likely to revisit the previous node (BFS-like neighbourhood exploration)

  • q (in-out parameter): low q makes the walk explore outward (DFS-like), revealing structural roles; high q stays close to the source (community-based embeddings)

The Skip-gram model is trained with SGD and negative sampling, using a linearly decaying learning rate per epoch.

Use Cases

  • Community-aware embeddings (high q)

  • Structural-role embeddings (low q)

  • Link prediction feature generation

  • Node classification pre-training

Example

CALL algo.node2vec({embeddingDimension: 64, p: 1.0, q: 0.5, walkLength: 30, seed: 42})
YIELD node, embedding
RETURN node.name AS name, embedding

References


algo.graphsage

Property Value

Procedure

algo.graphsage

Category

Node Embedding

Complexity

CPU

Min Args

0

Max Args

1

Syntax

CALL algo.graphsage([config]) YIELD node, embedding

Parameters

Name Type Default Description

config

Map

{}

Configuration map (see below)

Config Parameters

Key Type Default Description

embeddingDimension

Integer

64

Output embedding size per layer

layers

Integer

2

Number of aggregation layers (receptive field = layers hops)

relTypes

String

all types

Comma-separated edge type names

direction

String

BOTH

Edge traversal direction

seed

Long

-1

Random seed; -1 = random

Yield Fields

Field Type Description

node

Vertex

The vertex

embedding

List<Float>

L2-normalised embedding vector of length embeddingDimension

Description

GraphSAGE (Hamilton et al., 2017) — inductive neighbourhood aggregation. This implementation is unsupervised: no training labels are needed. Node features are initialised from structural properties (log-normalised degree + Gaussian noise) and then propagated through layers rounds of mean aggregation, each followed by a random linear projection (Xavier initialisation) and ReLU activation. The resulting embeddings capture multi-hop structural similarity.

Because the projection matrices are randomly initialised rather than trained, the embeddings are structurally consistent (nodes with similar neighbourhoods receive similar embeddings) and can be used directly as ML features or refined with downstream fine-tuning.

Use Cases

  • Inductive embeddings for graphs that grow over time

  • Feature generation for node classification / link prediction

  • Similarity search and clustering on graph data

Example

CALL algo.graphsage({embeddingDimension: 32, layers: 2, seed: 7})
YIELD node, embedding
RETURN node.name AS name, embedding

References


algo.hashgnn

Property Value

Procedure

algo.hashgnn

Category

Node Embedding

Complexity

CPU

Min Args

0

Max Args

1

Syntax

CALL algo.hashgnn([config]) YIELD node, embedding

Parameters

Name Type Default Description

config

Map

{}

Configuration map (see below)

Config Parameters

Key Type Default Description

embeddingDimension

Integer

128

Output embedding size (number of MinHash functions)

iterations

Integer

4

Message-passing rounds (receptive field = iterations hops)

relTypes

String

all types

Comma-separated edge type names

direction

String

BOTH

Edge traversal direction

seed

Long

-1

Random seed; -1 = random

Yield Fields

Field Type Description

node

Vertex

The vertex

embedding

List<Float>

L2-normalised MinHash embedding vector of length embeddingDimension

Description

HashGNN is a training-free graph neural network that uses locality-sensitive hashing for neighbourhood aggregation. Each node is initialised with a sparse random binary feature vector (≈12.5% density) derived from its structural identity. For each propagation round, each node’s feature set is expanded by OR-combining neighbour feature sets, then reduced to a fixed-size MinHash sketch using random linear hash functions h_d(x) = (a·x + b) mod F. The final embedding is the L2-normalised MinHash signature, providing probabilistic Jaccard similarity guarantees: two nodes with similar MinHash signatures have similar neighbourhood feature sets.

HashGNN is extremely fast (no matrix multiplication, no gradient computation) and naturally handles graphs without node features.

Use Cases

  • Ultra-fast structural embeddings as features for downstream ML

  • Graph-level similarity via pooled embeddings

  • Anomaly detection (unusual neighbourhood structure)

Example

CALL algo.hashgnn({embeddingDimension: 64, iterations: 3, seed: 42})
YIELD node, embedding
RETURN node.name AS name, embedding

References