Similarity and Link Prediction
algo.jaccard
| Property | Value |
|---|---|
Procedure |
|
Category |
Similarity |
Complexity |
CPU |
Min Args |
1 |
Max Args |
4 |
Syntax
CALL algo.jaccard(node [, relTypes, direction, cutoff])
YIELD node1, node2, similarity
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
|
Vertex |
Yes |
— |
Source vertex to compare against all others |
|
String |
No |
all types |
Comma-separated relationship types |
|
String |
No |
|
Traversal direction: |
|
Double |
No |
|
Minimum similarity threshold; pairs below are excluded |
Yield Fields
| Field | Type | Description |
|---|---|---|
|
RID |
Source vertex identity |
|
RID |
Compared vertex identity |
|
Double |
Jaccard similarity: |N(src) ∩ N(v)| / |N(src) ∪ N(v)| |
Description
Computes the Jaccard similarity between the given source vertex and all other vertices, based on neighborhood overlap. Formula: similarity = |N(src) ∩ N(v)| / |N(src) ∪ N(v)|. Pairs with similarity below cutoff are filtered out.
Use Cases
-
Collaborative filtering: "users with similar tastes"
-
Friend recommendation based on shared connections
-
Document similarity via shared keyword links
Example
MATCH (u:User {name:'Alice'})
CALL algo.jaccard(u, 'LIKES', 'OUT', 0.2)
YIELD node1, node2, similarity
RETURN node2, similarity ORDER BY similarity DESC LIMIT 10
References
algo.adamicAdar
| Property | Value |
|---|---|
Procedure |
|
Category |
Link Prediction |
Complexity |
CPU |
Min Args |
1 |
Max Args |
4 |
Syntax
CALL algo.adamicAdar(node [, relTypes, direction, cutoff])
YIELD node1, node2, score
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
|
Vertex |
Yes |
— |
Source vertex |
|
String |
No |
all types |
Comma-separated relationship types |
|
String |
No |
|
Traversal direction: |
|
Double |
No |
|
Minimum score threshold |
Yield Fields
| Field | Type | Description |
|---|---|---|
|
RID |
Source vertex identity |
|
RID |
Compared vertex identity |
|
Double |
Adamic-Adar score; higher = more likely to form a link |
Description
Computes the Adamic-Adar link prediction score between the source vertex and all non-adjacent vertices. For each pair, the score is the sum over common neighbors w of 1/log(degree(w)). Highly-connected common neighbors contribute less than rare common neighbors. Results are sorted by score descending.
Use Cases
-
Social network friend recommendation
-
Knowledge graph link prediction
-
Co-authorship prediction in citation networks
Example
MATCH (p:Person {name:'Bob'})
CALL algo.adamicAdar(p, 'KNOWS', 'BOTH', 0.5)
YIELD node1, node2, score
RETURN node2, score ORDER BY score DESC LIMIT 10
References
-
Adamic, L. & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230.
algo.commonNeighbors
| Property | Value |
|---|---|
Procedure |
|
Category |
Link Prediction |
Complexity |
CPU |
Min Args |
1 |
Max Args |
4 |
Syntax
CALL algo.commonNeighbors(node [, relTypes, direction, cutoff])
YIELD node1, node2, commonNeighbors
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
|
Vertex |
Yes |
— |
Source vertex |
|
String |
No |
all types |
Comma-separated relationship types |
|
String |
No |
|
Traversal direction: |
|
Integer |
No |
|
Minimum number of common neighbors |
Yield Fields
| Field | Type | Description |
|---|---|---|
|
RID |
Source vertex identity |
|
RID |
Compared vertex identity |
|
Integer |
Count of shared neighbors |
Description
Counts the number of common neighbors between the source vertex and all other vertices. Pairs with fewer than cutoff common neighbors are excluded. Results are sorted by count descending.
Use Cases
-
Simple friend-of-a-friend recommendation
-
Link prediction baseline
-
Mutual connection analysis
Example
MATCH (u:User {name:'Charlie'})
CALL algo.commonNeighbors(u, 'FOLLOWS', 'BOTH', 3)
YIELD node1, node2, commonNeighbors
RETURN node2, commonNeighbors ORDER BY commonNeighbors DESC LIMIT 10
References
algo.preferentialAttachment
| Property | Value |
|---|---|
Procedure |
|
Category |
Link Prediction |
Complexity |
CPU |
Min Args |
1 |
Max Args |
3 |
Syntax
CALL algo.preferentialAttachment(node [, relTypes, direction])
YIELD node1, node2, score
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
|
Vertex |
Yes |
— |
Source vertex |
|
String |
No |
all types |
Comma-separated relationship types |
|
String |
No |
|
Traversal direction: |
Yield Fields
| Field | Type | Description |
|---|---|---|
|
RID |
Source vertex identity |
|
RID |
Compared vertex identity |
|
Long |
Product of the two nodes' degrees |
Description
Computes the preferential attachment score for each pair as degree(node1) × degree(node2). Based on the observation that in evolving networks, new links preferentially form between high-degree nodes. A simple but effective baseline for link prediction in scale-free networks.
Use Cases
-
Scale-free network link prediction
-
Social network "people you may know" baseline
-
Network growth modeling
Example
MATCH (u:User {id:1})
CALL algo.preferentialAttachment(u, 'KNOWS', 'BOTH')
YIELD node1, node2, score
RETURN node2, score ORDER BY score DESC LIMIT 10
References
algo.resourceAllocation
| Property | Value |
|---|---|
Procedure |
|
Category |
Link Prediction |
Complexity |
CPU |
Min Args |
1 |
Max Args |
4 |
Syntax
CALL algo.resourceAllocation(node [, relTypes, direction, cutoff])
YIELD node1, node2, score
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
|
Vertex |
Yes |
— |
Source vertex |
|
String |
No |
all types |
Comma-separated relationship types |
|
String |
No |
|
Traversal direction: |
|
Double |
No |
|
Minimum score threshold |
Yield Fields
| Field | Type | Description |
|---|---|---|
|
RID |
Source vertex identity |
|
RID |
Compared vertex identity |
|
Double |
Resource allocation score |
Description
Computes the resource allocation link prediction score: score = Σ 1/degree(w) over all common neighbors w. Models the fraction of "resources" that each common neighbor can send to both endpoints if it distributes equally to all its neighbors. Conceptually similar to Adamic-Adar but uses degree directly rather than its logarithm.
Use Cases
-
Link prediction with emphasis on rare intermediaries
-
Recommendation in sparse networks
-
Metabolic network pathway prediction
Example
MATCH (p:Protein {id:'P1'})
CALL algo.resourceAllocation(p, 'INTERACTS', 'BOTH', 0.1)
YIELD node1, node2, score
RETURN node2, score ORDER BY score DESC LIMIT 10
References
-
Zhou, T., Lü, L., & Zhang, Y. C. (2009). Predicting missing links via local information. European Physical Journal B, 71(4), 623–630.
algo.simRank
| Property | Value |
|---|---|
Procedure |
|
Category |
Similarity |
Complexity |
CPU+RAM |
Min Args |
2 |
Max Args |
5 |
Syntax
CALL algo.simRank(nodeA, nodeB [, relTypes, decayFactor, maxIterations])
YIELD similarity, nodeAId, nodeBId
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
|
Vertex |
Yes |
— |
First vertex |
|
Vertex |
Yes |
— |
Second vertex |
|
String |
No |
all types |
Comma-separated relationship types |
|
Double |
No |
|
Decay constant C ∈ (0,1) |
|
Integer |
No |
|
Number of refinement iterations |
Yield Fields
| Field | Type | Description |
|---|---|---|
|
Double |
SimRank similarity score between nodeA and nodeB |
|
RID |
Identity of nodeA |
|
RID |
Identity of nodeB |
Description
Computes SimRank similarity between two vertices using the recursive definition: two nodes are similar if they are pointed to by similar nodes. Maintains a full V×V similarity matrix (O(V²) memory), updated iteratively. Similarity of a node with itself is 1.0. Uses IN adjacency for the standard directed graph formulation.
Use Cases
-
Structural equivalence analysis in directed graphs
-
Computing vertex pair similarity for matching problems
-
Entity resolution in knowledge graphs
Example
MATCH (a:Paper {id:'P1'}), (b:Paper {id:'P2'})
CALL algo.simRank(a, b, 'CITES', 0.8, 5)
YIELD similarity, nodeAId, nodeBId
RETURN nodeAId, nodeBId, similarity
References
-
Jeh, G. & Widom, J. (2002). SimRank: a measure of structural-context similarity. KDD 2002.
algo.knn
| Property | Value |
|---|---|
Procedure |
|
Category |
Similarity |
Complexity |
CPU |
Min Args |
0 |
Max Args |
3 |
Syntax
CALL algo.knn([k, relTypes, direction]) YIELD node1, node2, similarity
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
|
Integer |
|
Number of nearest neighbours to return per node |
|
String |
all types |
Comma-separated edge type names to consider |
|
String |
|
Edge traversal direction: |
Yield Fields
| Field | Type | Description |
|---|---|---|
|
Vertex |
Source vertex |
|
Vertex |
One of the k nearest neighbours of node1 |
|
Float |
Jaccard similarity score in [0.0, 1.0] |
Description
Constructs a k-Nearest Neighbours (KNN) graph using Jaccard similarity on neighbourhood sets. For each pair of nodes, the similarity is computed as |N(u) ∩ N(v)| / |N(u) ∪ N(v)|. Each node retains only its top-k neighbours by similarity score. This produces a new graph representing approximate structural proximity, useful as a pre-processing step for community detection or recommendation.
Use Cases
-
Constructing similarity graphs for downstream community detection
-
Collaborative filtering and recommendation
-
Detecting structurally similar entities (users, documents, products)
Example
CALL algo.knn(10, 'KNOWS', 'BOTH')
YIELD node1, node2, similarity
RETURN node1.name AS source, node2.name AS neighbour, similarity
ORDER BY source, similarity DESC
References
algo.sameCommunity
| Property | Value |
|---|---|
Procedure |
|
Category |
Link Prediction |
Complexity |
CPU |
Min Args |
3 |
Max Args |
3 |
Syntax
CALL algo.sameCommunity(node1, node2, communityProperty) YIELD node1, node2, coefficient
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
|
Vertex |
— |
First vertex |
|
Vertex |
— |
Second vertex |
|
String |
— |
Property name holding the community identifier on each vertex |
Yield Fields
| Field | Type | Description |
|---|---|---|
|
Vertex |
The first vertex |
|
Vertex |
The second vertex |
|
Float |
|
Description
A simple link-prediction feature: returns 1.0 when both nodes carry the same value for the given community property and 0.0 otherwise. Useful as a binary feature in machine-learning pipelines or to filter / weight edges based on community membership.
Use Cases
-
Feature engineering for link-prediction models
-
Filtering candidate edges to intra-community connections
-
Evaluating community detection quality (ground-truth comparison)
Example
MATCH (a:Person), (b:Person)
WHERE a <> b
CALL algo.sameCommunity(a, b, 'community')
YIELD coefficient
WHERE coefficient = 1.0
RETURN a.name, b.name
References
algo.totalNeighbors
| Property | Value |
|---|---|
Procedure |
|
Category |
Link Prediction |
Complexity |
CPU |
Min Args |
2 |
Max Args |
4 |
Syntax
CALL algo.totalNeighbors(node1, node2 [, relTypes, direction]) YIELD node1, node2, coefficient
Parameters
| Name | Type | Default | Description |
|---|---|---|---|
|
Vertex |
— |
First vertex |
|
Vertex |
— |
Second vertex |
|
String |
all types |
Comma-separated edge type names to traverse |
|
String |
|
Edge traversal direction: |
Yield Fields
| Field | Type | Description |
|---|---|---|
|
Vertex |
The first vertex |
|
Vertex |
The second vertex |
|
Long |
Size of the union of the two neighbourhood sets |
Description
Computes |N(u) ∪ N(v)| — the total number of distinct neighbours across both nodes. This is the denominator of Jaccard similarity and is used as a link-prediction feature measuring total neighbourhood reach. Calculated as |N(u)| + |N(v)| − |N(u) ∩ N(v)| using a BitSet intersection.
Use Cases
-
Feature engineering for link-prediction models
-
Identifying node pairs with large combined reach
-
Pre-filtering candidates before more expensive similarity computation
Example
MATCH (a:Person {name: 'Alice'}), (b:Person {name: 'Bob'})
CALL algo.totalNeighbors(a, b, 'KNOWS', 'BOTH')
YIELD coefficient
RETURN coefficient AS totalNeighbours
References