Similarity and Link Prediction

algo.jaccard

Property Value

Property	Value
Procedure	`algo.jaccard`
Category	Similarity
Complexity	CPU
Min Args	1
Max Args	4

Procedure

algo.jaccard

Category

Similarity

Complexity

CPU

Min Args

Max Args

Syntax

CALL algo.jaccard(node [, relTypes, direction, cutoff])
YIELD node1, node2, similarity

Parameters

Parameter Type Required Default Description

Parameter	Type	Required	Default	Description
`node`	Vertex	Yes	—	Source vertex to compare against all others
`relTypes`	String	No	all types	Comma-separated relationship types
`direction`	String	No	`"BOTH"`	Traversal direction: `"IN"`, `"OUT"`, or `"BOTH"`
`cutoff`	Double	No	`0.0`	Minimum similarity threshold; pairs below are excluded

node

Vertex

Yes

—

Source vertex to compare against all others

relTypes

String

all types

Comma-separated relationship types

direction

String

"BOTH"

Traversal direction: "IN", "OUT", or "BOTH"

cutoff

Double

0.0

Minimum similarity threshold; pairs below are excluded

Yield Fields

Field Type Description

Field	Type	Description
`node1`	RID	Source vertex identity
`node2`	RID	Compared vertex identity
`similarity`	Double	Jaccard similarity: \|N(src) ∩ N(v)\| / \|N(src) ∪ N(v)\|

node1

RID

Source vertex identity

node2

RID

Compared vertex identity

similarity

Double

Jaccard similarity: |N(src) ∩ N(v)| / |N(src) ∪ N(v)|

Description

Computes the Jaccard similarity between the given source vertex and all other vertices, based on neighborhood overlap. Formula: similarity = |N(src) ∩ N(v)| / |N(src) ∪ N(v)|. Pairs with similarity below cutoff are filtered out.

Use Cases

Collaborative filtering: "users with similar tastes"
Friend recommendation based on shared connections
Document similarity via shared keyword links

Example

MATCH (u:User {name:'Alice'})
CALL algo.jaccard(u, 'LIKES', 'OUT', 0.2)
YIELD node1, node2, similarity
RETURN node2, similarity ORDER BY similarity DESC LIMIT 10

References

Jaccard index – Wikipedia

algo.adamicAdar

Property Value

Property	Value
Procedure	`algo.adamicAdar`
Category	Link Prediction
Complexity	CPU
Min Args	1
Max Args	4

Procedure

algo.adamicAdar

Category

Link Prediction

Complexity

CPU

Min Args

Max Args

Syntax

CALL algo.adamicAdar(node [, relTypes, direction, cutoff])
YIELD node1, node2, score

Parameters

Parameter Type Required Default Description

Parameter	Type	Required	Default	Description
`node`	Vertex	Yes	—	Source vertex
`relTypes`	String	No	all types	Comma-separated relationship types
`direction`	String	No	`"BOTH"`	Traversal direction: `"IN"`, `"OUT"`, or `"BOTH"`
`cutoff`	Double	No	`0.0`	Minimum score threshold

node

Vertex

Yes

—

Source vertex

relTypes

String

all types

Comma-separated relationship types

direction

String

"BOTH"

Traversal direction: "IN", "OUT", or "BOTH"

cutoff

Double

0.0

Minimum score threshold

Yield Fields

Field Type Description

Field	Type	Description
`node1`	RID	Source vertex identity
`node2`	RID	Compared vertex identity
`score`	Double	Adamic-Adar score; higher = more likely to form a link

node1

RID

Source vertex identity

node2

RID

Compared vertex identity

score

Double

Adamic-Adar score; higher = more likely to form a link

Description

Computes the Adamic-Adar link prediction score between the source vertex and all non-adjacent vertices. For each pair, the score is the sum over common neighbors w of 1/log(degree(w)). Highly-connected common neighbors contribute less than rare common neighbors. Results are sorted by score descending.

Use Cases

Social network friend recommendation
Knowledge graph link prediction
Co-authorship prediction in citation networks

Example

MATCH (p:Person {name:'Bob'})
CALL algo.adamicAdar(p, 'KNOWS', 'BOTH', 0.5)
YIELD node1, node2, score
RETURN node2, score ORDER BY score DESC LIMIT 10

References

Adamic, L. & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230.

algo.commonNeighbors

Property Value

Property	Value
Procedure	`algo.commonNeighbors`
Category	Link Prediction
Complexity	CPU
Min Args	1
Max Args	4

Procedure

algo.commonNeighbors

Category

Link Prediction

Complexity

CPU

Min Args

Max Args

Syntax

CALL algo.commonNeighbors(node [, relTypes, direction, cutoff])
YIELD node1, node2, commonNeighbors

Parameters

Parameter Type Required Default Description

Parameter	Type	Required	Default	Description
`node`	Vertex	Yes	—	Source vertex
`relTypes`	String	No	all types	Comma-separated relationship types
`direction`	String	No	`"BOTH"`	Traversal direction: `"IN"`, `"OUT"`, or `"BOTH"`
`cutoff`	Integer	No	`1`	Minimum number of common neighbors

node

Vertex

Yes

—

Source vertex

relTypes

String

all types

Comma-separated relationship types

direction

String

"BOTH"

Traversal direction: "IN", "OUT", or "BOTH"

cutoff

Integer

1

Minimum number of common neighbors

Yield Fields

Field Type Description

Field	Type	Description
`node1`	RID	Source vertex identity
`node2`	RID	Compared vertex identity
`commonNeighbors`	Integer	Count of shared neighbors

node1

RID

Source vertex identity

node2

RID

Compared vertex identity

commonNeighbors

Integer

Count of shared neighbors

Description

Counts the number of common neighbors between the source vertex and all other vertices. Pairs with fewer than cutoff common neighbors are excluded. Results are sorted by count descending.

Use Cases

Simple friend-of-a-friend recommendation
Link prediction baseline
Mutual connection analysis

Example

MATCH (u:User {name:'Charlie'})
CALL algo.commonNeighbors(u, 'FOLLOWS', 'BOTH', 3)
YIELD node1, node2, commonNeighbors
RETURN node2, commonNeighbors ORDER BY commonNeighbors DESC LIMIT 10

References

Link prediction – Wikipedia

algo.preferentialAttachment

Property Value

Property	Value
Procedure	`algo.preferentialAttachment`
Category	Link Prediction
Complexity	CPU
Min Args	1
Max Args	3

Procedure

algo.preferentialAttachment

Category

Link Prediction

Complexity

CPU

Min Args

Max Args

Syntax

CALL algo.preferentialAttachment(node [, relTypes, direction])
YIELD node1, node2, score

Parameters

Parameter Type Required Default Description

Parameter	Type	Required	Default	Description
`node`	Vertex	Yes	—	Source vertex
`relTypes`	String	No	all types	Comma-separated relationship types
`direction`	String	No	`"BOTH"`	Traversal direction: `"IN"`, `"OUT"`, or `"BOTH"`

node

Vertex

Yes

—

Source vertex

relTypes

String

all types

Comma-separated relationship types

direction

String

"BOTH"

Traversal direction: "IN", "OUT", or "BOTH"

Yield Fields

Field Type Description

Field	Type	Description
`node1`	RID	Source vertex identity
`node2`	RID	Compared vertex identity
`score`	Long	Product of the two nodes' degrees

node1

RID

Source vertex identity

node2

RID

Compared vertex identity

score

Long

Product of the two nodes' degrees

Description

Computes the preferential attachment score for each pair as degree(node1) × degree(node2). Based on the observation that in evolving networks, new links preferentially form between high-degree nodes. A simple but effective baseline for link prediction in scale-free networks.

Use Cases

Scale-free network link prediction
Social network "people you may know" baseline
Network growth modeling

Example

MATCH (u:User {id:1})
CALL algo.preferentialAttachment(u, 'KNOWS', 'BOTH')
YIELD node1, node2, score
RETURN node2, score ORDER BY score DESC LIMIT 10

References

Preferential attachment – Wikipedia

algo.resourceAllocation

Property Value

Property	Value
Procedure	`algo.resourceAllocation`
Category	Link Prediction
Complexity	CPU
Min Args	1
Max Args	4

Procedure

algo.resourceAllocation

Category

Link Prediction

Complexity

CPU

Min Args

Max Args

Syntax

CALL algo.resourceAllocation(node [, relTypes, direction, cutoff])
YIELD node1, node2, score

Parameters

Parameter Type Required Default Description

Parameter	Type	Required	Default	Description
`node`	Vertex	Yes	—	Source vertex
`relTypes`	String	No	all types	Comma-separated relationship types
`direction`	String	No	`"BOTH"`	Traversal direction: `"IN"`, `"OUT"`, or `"BOTH"`
`cutoff`	Double	No	`0.0`	Minimum score threshold

node

Vertex

Yes

—

Source vertex

relTypes

String

all types

Comma-separated relationship types

direction

String

"BOTH"

Traversal direction: "IN", "OUT", or "BOTH"

cutoff

Double

0.0

Minimum score threshold

Yield Fields

Field Type Description

Field	Type	Description
`node1`	RID	Source vertex identity
`node2`	RID	Compared vertex identity
`score`	Double	Resource allocation score

node1

RID

Source vertex identity

node2

RID

Compared vertex identity

score

Double

Resource allocation score

Description

Computes the resource allocation link prediction score: score = Σ 1/degree(w) over all common neighbors w. Models the fraction of "resources" that each common neighbor can send to both endpoints if it distributes equally to all its neighbors. Conceptually similar to Adamic-Adar but uses degree directly rather than its logarithm.

Use Cases

Link prediction with emphasis on rare intermediaries
Recommendation in sparse networks
Metabolic network pathway prediction

Example

MATCH (p:Protein {id:'P1'})
CALL algo.resourceAllocation(p, 'INTERACTS', 'BOTH', 0.1)
YIELD node1, node2, score
RETURN node2, score ORDER BY score DESC LIMIT 10

References

Zhou, T., Lü, L., & Zhang, Y. C. (2009). Predicting missing links via local information. European Physical Journal B, 71(4), 623–630.

algo.simRank

Property Value

Property	Value
Procedure	`algo.simRank`
Category	Similarity
Complexity	CPU+RAM
Min Args	2
Max Args	5

Procedure

algo.simRank

Category

Similarity

Complexity

CPU+RAM

Min Args

Max Args

Syntax

CALL algo.simRank(nodeA, nodeB [, relTypes, decayFactor, maxIterations])
YIELD similarity, nodeAId, nodeBId

Parameters

Parameter Type Required Default Description

Parameter	Type	Required	Default	Description
`nodeA`	Vertex	Yes	—	First vertex
`nodeB`	Vertex	Yes	—	Second vertex
`relTypes`	String	No	all types	Comma-separated relationship types
`decayFactor`	Double	No	`0.8`	Decay constant C ∈ (0,1)
`maxIterations`	Integer	No	`5`	Number of refinement iterations

nodeA

Vertex

Yes

—

First vertex

nodeB

Vertex

Yes

—

Second vertex

relTypes

String

all types

Comma-separated relationship types

decayFactor

Double

0.8

Decay constant C ∈ (0,1)

maxIterations

Integer

5

Number of refinement iterations

Yield Fields

Field Type Description

Field	Type	Description
`similarity`	Double	SimRank similarity score between nodeA and nodeB
`nodeAId`	RID	Identity of nodeA
`nodeBId`	RID	Identity of nodeB

similarity

Double

SimRank similarity score between nodeA and nodeB

nodeAId

RID

Identity of nodeA

nodeBId

RID

Identity of nodeB

Description

Computes SimRank similarity between two vertices using the recursive definition: two nodes are similar if they are pointed to by similar nodes. Maintains a full V×V similarity matrix (O(V²) memory), updated iteratively. Similarity of a node with itself is 1.0. Uses IN adjacency for the standard directed graph formulation.

Use Cases

Structural equivalence analysis in directed graphs
Computing vertex pair similarity for matching problems
Entity resolution in knowledge graphs

Example

MATCH (a:Paper {id:'P1'}), (b:Paper {id:'P2'})
CALL algo.simRank(a, b, 'CITES', 0.8, 5)
YIELD similarity, nodeAId, nodeBId
RETURN nodeAId, nodeBId, similarity

References

Jeh, G. & Widom, J. (2002). SimRank: a measure of structural-context similarity. KDD 2002.

algo.knn

Property Value

Property	Value
Procedure	`algo.knn`
Category	Similarity
Complexity	CPU
Min Args	0
Max Args	3

Procedure

algo.knn

Category

Similarity

Complexity

CPU

Min Args

Max Args

Syntax

CALL algo.knn([k, relTypes, direction]) YIELD node1, node2, similarity

Parameters

Name Type Default Description

Name	Type	Default	Description
`k`	Integer	`5`	Number of nearest neighbours to return per node
`relTypes`	String	all types	Comma-separated edge type names to consider
`direction`	String	`"BOTH"`	Edge traversal direction: `IN`, `OUT`, or `BOTH`

k

Integer

5

Number of nearest neighbours to return per node

relTypes

String

all types

Comma-separated edge type names to consider

direction

String

"BOTH"

Edge traversal direction: IN, OUT, or BOTH

Yield Fields

Field Type Description

Field	Type	Description
`node1`	Vertex	Source vertex
`node2`	Vertex	One of the k nearest neighbours of node1
`similarity`	Float	Jaccard similarity score in [0.0, 1.0]

node1

Vertex

Source vertex

node2

Vertex

One of the k nearest neighbours of node1

similarity

Float

Jaccard similarity score in [0.0, 1.0]

Description

Constructs a k-Nearest Neighbours (KNN) graph using Jaccard similarity on neighbourhood sets. For each pair of nodes, the similarity is computed as |N(u) ∩ N(v)| / |N(u) ∪ N(v)|. Each node retains only its top-k neighbours by similarity score. This produces a new graph representing approximate structural proximity, useful as a pre-processing step for community detection or recommendation.

Use Cases

Constructing similarity graphs for downstream community detection
Collaborative filtering and recommendation
Detecting structurally similar entities (users, documents, products)

Example

CALL algo.knn(10, 'KNOWS', 'BOTH')
YIELD node1, node2, similarity
RETURN node1.name AS source, node2.name AS neighbour, similarity
ORDER BY source, similarity DESC

References

k-nearest neighbors algorithm – Wikipedia

algo.sameCommunity

Property Value

Property	Value
Procedure	`algo.sameCommunity`
Category	Link Prediction
Complexity	CPU
Min Args	3
Max Args	3

Procedure

algo.sameCommunity

Category

Link Prediction

Complexity

CPU

Min Args

Max Args

Syntax

CALL algo.sameCommunity(node1, node2, communityProperty) YIELD node1, node2, coefficient

Parameters

Name Type Default Description

Name	Type	Default	Description
`node1`	Vertex	—	First vertex
`node2`	Vertex	—	Second vertex
`communityProperty`	String	—	Property name holding the community identifier on each vertex

node1

Vertex

—

First vertex

node2

Vertex

—

Second vertex

communityProperty

String

—

Property name holding the community identifier on each vertex

Yield Fields

Field Type Description

Field	Type	Description
`node1`	Vertex	The first vertex
`node2`	Vertex	The second vertex
`coefficient`	Float	`1.0` if both nodes share the same community value, `0.0` otherwise

node1

Vertex

The first vertex

node2

Vertex

The second vertex

coefficient

Float

1.0 if both nodes share the same community value, 0.0 otherwise

Description

A simple link-prediction feature: returns 1.0 when both nodes carry the same value for the given community property and 0.0 otherwise. Useful as a binary feature in machine-learning pipelines or to filter / weight edges based on community membership.

Use Cases

Feature engineering for link-prediction models
Filtering candidate edges to intra-community connections
Evaluating community detection quality (ground-truth comparison)

Example

MATCH (a:Person), (b:Person)
WHERE a <> b
CALL algo.sameCommunity(a, b, 'community')
YIELD coefficient
WHERE coefficient = 1.0
RETURN a.name, b.name

References

Link prediction – Wikipedia

algo.totalNeighbors

Property Value

Property	Value
Procedure	`algo.totalNeighbors`
Category	Link Prediction
Complexity	CPU
Min Args	2
Max Args	4

Procedure

algo.totalNeighbors

Category

Link Prediction

Complexity

CPU

Min Args

Max Args

Syntax

CALL algo.totalNeighbors(node1, node2 [, relTypes, direction]) YIELD node1, node2, coefficient

Parameters

Name Type Default Description

Name	Type	Default	Description
`node1`	Vertex	—	First vertex
`node2`	Vertex	—	Second vertex
`relTypes`	String	all types	Comma-separated edge type names to traverse
`direction`	String	`"BOTH"`	Edge traversal direction: `IN`, `OUT`, or `BOTH`

node1

Vertex

—

First vertex

node2

Vertex

—

Second vertex

relTypes

String

all types

Comma-separated edge type names to traverse

direction

String

"BOTH"

Edge traversal direction: IN, OUT, or BOTH

Yield Fields

Field Type Description

Field	Type	Description
`node1`	Vertex	The first vertex
`node2`	Vertex	The second vertex
`coefficient`	Long	Size of the union of the two neighbourhood sets

node1

Vertex

The first vertex

node2

Vertex

The second vertex

coefficient

Long

Size of the union of the two neighbourhood sets

Description

Computes |N(u) ∪ N(v)| — the total number of distinct neighbours across both nodes. This is the denominator of Jaccard similarity and is used as a link-prediction feature measuring total neighbourhood reach. Calculated as |N(u)| + |N(v)| − |N(u) ∩ N(v)| using a BitSet intersection.

Use Cases

Feature engineering for link-prediction models
Identifying node pairs with large combined reach
Pre-filtering candidates before more expensive similarity computation

Example

MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'})
CALL algo.totalNeighbors(a, b, 'KNOWS', 'BOTH')
YIELD coefficient
RETURN coefficient AS totalNeighbours

References

Jaccard index – Wikipedia