Node Embedding

Node embedding procedures map each vertex to a dense real-valued vector that encodes its structural position in the graph. The resulting vectors can be used directly as features for machine-learning tasks (classification, clustering, link prediction) or for similarity search using ArcadeDB’s vector index.

All embedding procedures yield a List<Float> that can be stored as a vertex property or forwarded to a downstream vector index.

algo.fastrp

Property Value

Property	Value
Procedure	`algo.fastrp`
Category	Node Embedding
Complexity	CPU
Min Args	0
Max Args	1

Procedure

algo.fastrp

Category

Node Embedding

Complexity

CPU

Min Args

Max Args

Syntax

CALL algo.fastrp([config]) YIELD node, embedding

Parameters

Name Type Default Description

Name	Type	Default	Description
`config`	Map	`{}`	Configuration map (see below)

config

Map

{}

Configuration map (see below)

Config Parameters

Key Type Default Description

Key	Type	Default	Description
`dimensions`	Integer	`128`	Embedding vector size
`iterations`	Integer	`4`	Number of propagation rounds
`normalization`	Float	`0.0`	Degree-normalisation exponent α: weight of neighbour j contributing to node i is proportional to `deg(i)^{-α} × deg(j)^{-α}`; 0 = no normalisation, 1 = GCN-style
`selfInfluence`	Float	`0.0`	Weight [0,1] given to the node’s own previous embedding vs. the aggregated neighbour embedding
`relTypes`	String	all types	Comma-separated edge type names
`direction`	String	`BOTH`	Edge traversal direction: `IN`, `OUT`, or `BOTH`
`seed`	Long	`-1`	Random seed; -1 = random

dimensions

Integer

128

Embedding vector size

iterations

Integer

4

Number of propagation rounds

normalization

Float

0.0

Degree-normalisation exponent α: weight of neighbour j contributing to node i is proportional to deg(i)^{-α} × deg(j)^{-α}; 0 = no normalisation, 1 = GCN-style

selfInfluence

Float

0.0

Weight [0,1] given to the node’s own previous embedding vs. the aggregated neighbour embedding

relTypes

String

all types

Comma-separated edge type names

direction

String

BOTH

Edge traversal direction: IN, OUT, or BOTH

seed

Long

-1

Random seed; -1 = random

Yield Fields

Field Type Description

Field	Type	Description
`node`	Vertex	The vertex
`embedding`	List<Float>	L2-normalised embedding vector of length `dimensions`

node

Vertex

The vertex

embedding

List<Float>

L2-normalised embedding vector of length dimensions

Description

FastRP (Fast Random Projection, Chen et al. 2019) generates dense node embeddings without any training phase. Each node is initialised with a sparse ternary random vector (values ±√3 with probability 1/6 each, 0 with probability 2/3 — the optimal sparse random projection of Achlioptas 2003). The embedding is then iteratively refined by computing a weighted average of neighbour embeddings, followed by L2 normalisation. The resulting vectors capture multi-hop structural proximity and run orders of magnitude faster than walk-based methods.

Use Cases

Fast baseline embeddings for downstream ML pipelines
Similarity search via vector index
Graph-aware feature generation without labelled data

Example

CALL algo.fastrp({dimensions: 64, iterations: 3, seed: 42})
YIELD node, embedding
RETURN node.name AS name, embedding

References

Chen, H. et al. (2019). Fast and Accurate Network Embeddings via Very Sparse Random Projection. CIKM 2019

algo.node2vec

Property Value

Property	Value
Procedure	`algo.node2vec`
Category	Node Embedding
Complexity	CPU
Min Args	0
Max Args	1

Procedure

algo.node2vec

Category

Node Embedding

Complexity

CPU

Min Args

Max Args

Syntax

CALL algo.node2vec([config]) YIELD node, embedding

Parameters

Name Type Default Description

Name	Type	Default	Description
`config`	Map	`{}`	Configuration map (see below)

config

Map

{}

Configuration map (see below)

Config Parameters

Key Type Default Description

Key	Type	Default	Description
`embeddingDimension`	Integer	`128`	Embedding vector size
`walkLength`	Integer	`80`	Number of steps per random walk
`walksPerNode`	Integer	`10`	Random walks generated per node
`iterations`	Integer	`1`	Training epochs over all walks
`windowSize`	Integer	`10`	Skip-gram context window radius
`negSamples`	Integer	`5`	Negative samples per positive (centre, context) pair
`learningRate`	Float	`0.025`	Initial SGD learning rate (linearly decayed per epoch)
`p`	Float	`1.0`	Return parameter: high p → less likely to return to previous node
`q`	Float	`1.0`	In-out parameter: low q → DFS-like exploration; high q → BFS-like
`relTypes`	String	all types	Comma-separated edge type names
`direction`	String	`BOTH`	Edge traversal direction
`seed`	Long	`-1`	Random seed; -1 = random

embeddingDimension

Integer

128

Embedding vector size

walkLength

Integer

80

Number of steps per random walk

walksPerNode

Integer

10

Random walks generated per node

iterations

Integer

1

Training epochs over all walks

windowSize

Integer

10

Skip-gram context window radius

negSamples

Integer

5

Negative samples per positive (centre, context) pair

learningRate

Float

0.025

Initial SGD learning rate (linearly decayed per epoch)

p

Float

1.0

Return parameter: high p → less likely to return to previous node

q

Float

1.0

In-out parameter: low q → DFS-like exploration; high q → BFS-like

relTypes

String

all types

Comma-separated edge type names

direction

String

BOTH

Edge traversal direction

seed

Long

-1

Random seed; -1 = random

Yield Fields

Field Type Description

Field	Type	Description
`node`	Vertex	The vertex
`embedding`	List<Float>	L2-normalised embedding vector of length `embeddingDimension`

node

Vertex

The vertex

embedding

List<Float>

L2-normalised embedding vector of length embeddingDimension

Description

Node2Vec (Grover & Leskovec, 2016) learns node embeddings using biased second-order random walks combined with a Skip-gram model trained via negative sampling. The walk bias is controlled by two parameters:

p (return parameter): low p makes the walk more likely to revisit the previous node (BFS-like neighbourhood exploration)
q (in-out parameter): low q makes the walk explore outward (DFS-like), revealing structural roles; high q stays close to the source (community-based embeddings)

The Skip-gram model is trained with SGD and negative sampling, using a linearly decaying learning rate per epoch.

Use Cases

Community-aware embeddings (high q)
Structural-role embeddings (low q)
Link prediction feature generation
Node classification pre-training

Example

CALL algo.node2vec({embeddingDimension: 64, p: 1.0, q: 0.5, walkLength: 30, seed: 42})
YIELD node, embedding
RETURN node.name AS name, embedding

References

Grover, A. & Leskovec, J. (2016). node2vec: Scalable Feature Learning for Networks. KDD 2016

algo.graphsage

Property Value

Property	Value
Procedure	`algo.graphsage`
Category	Node Embedding
Complexity	CPU
Min Args	0
Max Args	1

Procedure

algo.graphsage

Category

Node Embedding

Complexity

CPU

Min Args

Max Args

Syntax

CALL algo.graphsage([config]) YIELD node, embedding

Parameters

Name Type Default Description

Name	Type	Default	Description
`config`	Map	`{}`	Configuration map (see below)

config

Map

{}

Configuration map (see below)

Config Parameters

Key Type Default Description

Key	Type	Default	Description
`embeddingDimension`	Integer	`64`	Output embedding size per layer
`layers`	Integer	`2`	Number of aggregation layers (receptive field = layers hops)
`relTypes`	String	all types	Comma-separated edge type names
`direction`	String	`BOTH`	Edge traversal direction
`seed`	Long	`-1`	Random seed; -1 = random

embeddingDimension

Integer

64

Output embedding size per layer

layers

Integer

2

Number of aggregation layers (receptive field = layers hops)

relTypes

String

all types

Comma-separated edge type names

direction

String

BOTH

Edge traversal direction

seed

Long

-1

Random seed; -1 = random

Yield Fields

Field Type Description

Field	Type	Description
`node`	Vertex	The vertex
`embedding`	List<Float>	L2-normalised embedding vector of length `embeddingDimension`

node

Vertex

The vertex

embedding

List<Float>

L2-normalised embedding vector of length embeddingDimension

Description

GraphSAGE (Hamilton et al., 2017) — inductive neighbourhood aggregation. This implementation is unsupervised: no training labels are needed. Node features are initialised from structural properties (log-normalised degree + Gaussian noise) and then propagated through layers rounds of mean aggregation, each followed by a random linear projection (Xavier initialisation) and ReLU activation. The resulting embeddings capture multi-hop structural similarity.

Because the projection matrices are randomly initialised rather than trained, the embeddings are structurally consistent (nodes with similar neighbourhoods receive similar embeddings) and can be used directly as ML features or refined with downstream fine-tuning.

Use Cases

Inductive embeddings for graphs that grow over time
Feature generation for node classification / link prediction
Similarity search and clustering on graph data

Example

CALL algo.graphsage({embeddingDimension: 32, layers: 2, seed: 7})
YIELD node, embedding
RETURN node.name AS name, embedding

References

Hamilton, W. et al. (2017). Inductive Representation Learning on Large Graphs. NeurIPS 2017

algo.hashgnn

Property Value

Property	Value
Procedure	`algo.hashgnn`
Category	Node Embedding
Complexity	CPU
Min Args	0
Max Args	1

Procedure

algo.hashgnn

Category

Node Embedding

Complexity

CPU

Min Args

Max Args

Syntax

CALL algo.hashgnn([config]) YIELD node, embedding

Parameters

Name Type Default Description

Name	Type	Default	Description
`config`	Map	`{}`	Configuration map (see below)

config

Map

{}

Configuration map (see below)

Config Parameters

Key Type Default Description

Key	Type	Default	Description
`embeddingDimension`	Integer	`128`	Output embedding size (number of MinHash functions)
`iterations`	Integer	`4`	Message-passing rounds (receptive field = iterations hops)
`relTypes`	String	all types	Comma-separated edge type names
`direction`	String	`BOTH`	Edge traversal direction
`seed`	Long	`-1`	Random seed; -1 = random

embeddingDimension

Integer

128

Output embedding size (number of MinHash functions)

iterations

Integer

4

Message-passing rounds (receptive field = iterations hops)

relTypes

String

all types

Comma-separated edge type names

direction

String

BOTH

Edge traversal direction

seed

Long

-1

Random seed; -1 = random

Yield Fields

Field Type Description

Field	Type	Description
`node`	Vertex	The vertex
`embedding`	List<Float>	L2-normalised MinHash embedding vector of length `embeddingDimension`

node

Vertex

The vertex

embedding

List<Float>

L2-normalised MinHash embedding vector of length embeddingDimension

Description

HashGNN is a training-free graph neural network that uses locality-sensitive hashing for neighbourhood aggregation. Each node is initialised with a sparse random binary feature vector (≈12.5% density) derived from its structural identity. For each propagation round, each node’s feature set is expanded by OR-combining neighbour feature sets, then reduced to a fixed-size MinHash sketch using random linear hash functions h_d(x) = (a·x + b) mod F. The final embedding is the L2-normalised MinHash signature, providing probabilistic Jaccard similarity guarantees: two nodes with similar MinHash signatures have similar neighbourhood feature sets.

HashGNN is extremely fast (no matrix multiplication, no gradient computation) and naturally handles graphs without node features.

Use Cases

Ultra-fast structural embeddings as features for downstream ML
Graph-level similarity via pooled embeddings
Anomaly detection (unusual neighbourhood structure)

Example

CALL algo.hashgnn({embeddingDimension: 64, iterations: 3, seed: 42})
YIELD node, embedding
RETURN node.name AS name, embedding

References