Efficient Graph Algorithms for Neo4j

The goal of this library is to provide efficiently implemented, parallel versions of common graph algorithms for Neo4j 3.x exposed as Cypher procedures.

You can find the documentation here http://neo4j-contrib.github.io/neo4j-graph-algorithms

Releases are available here: https://github.com/neo4j-contrib/neo4j-graph-algorithms/releases

Algorithms

Centralities:

Page Rank (algo.pageRank)
Betweenness Centrality (algo.betweenness)
Closeness Centrality (algo.closeness)
Harmonic Centrality (algo.harmonic)

Community Detection:

Louvain (algo.louvain)
Label Propagation (algo.labelPropagation)
(Weakly) Connected Components (algo.unionFind)
Strongly Connected Components (algo.scc)
Triangle Count / Clustering Coefficient (algo.triangleCount)

Path Finding:

Minimum Weight Spanning Tree (algo.mst)
All Pairs- and Single Source - Shortest Path (algo.shortestPath, algo.allShortestPaths)

These procedures work either on the whole graph or on a subgraph optionally filtered by label and relationship-type. You can also use filtering and projection using Cypher queries, see below.

We’d love your feedback, so please try out these algorithms and let us know how well they work for your use-case. Also please note things that you miss from installation instructions, documentation, etc.

Please raise GitHub issues for anything you encounter or join the neo4j-users Slack group and ask in the #neo4j-graph-algorithm channel.

Installation

Just copy the graph-algorithms-algo-*.jar from the matching release into your $NEO4J_HOME/plugins directory.

Because the algorithms use the lower level Kernel API to read from and write to Neo4j you also have to enable them in the configuration (for security reasons):

Add to $NEO4J_HOME/conf/neo4j.conf

dbms.security.procedures.unrestricted=algo.*

Then running call algo.list(); should list the algorithm procedures. You can also see the full list in the documentation.

Usage

These algorithms are exposed as Neo4j procedures. You can call them directly from Cypher in your Neo4j Browser, from cypher-shell or your client code.

For most algorithms we provide two procedures, one that writes results back to the graph as node-properties and reports statistics. And another (named algo.<name>.stream) that returns a stream of data, e.g. node-ids and computed values.

For large graphs the streaming procedure might return millions or billions of results, that’s why it is often more convenient to store the results of the algorithm and then use them with later queries.

The general call syntax is:

CALL algo.<name>([label],[relationshipType],{config})

For example for page rank on DBpedia (11M nodes, 116M relationships):

CALL algo.pageRank('Page','Link',{iterations:5, dampingFactor:0.85, write: true, writeProperty:'pagerank'});
// YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, dampingFactor, write, writeProperty

CALL algo.pageRank.stream('Page','Link',{iterations:5, dampingFactor:0.85})
YIELD node, score
RETURN node.title, score
ORDER BY score DESC LIMIT 10;

Projection via Cypher Queries

If label and relationship-type are not selective enough to describe your subgraph to run the algorithm on, you can use Cypher statements to load or project subsets of your graph. Then use a node-statement instead of the label parameter and a relationship-statement instead of the relationship-type and use graph:'cypher' in the config.

You can also return a property value or weight (according to your config) in addition to the id’s from these statements.

CALL algo.pageRank(
'MATCH (p:Page) RETURN id(p) as id',
'MATCH (p1:Page)-[:Link]->(p2:Page) RETURN id(p1) as source, id(p2) as target, count(*) as weight',
{graph:'cypher', iterations:5, write: true});

The detailed call syntax and all parameters and possible return values for each algorithm are listed in the project’s documentation

Graph Loading

As it can take some time to load large graphs into the algorithm data structures, you can pre-load graphs and then later refer to them by name for several graph algorithms. After usage they can be removed from memory to free resources used.

// Load graph
CALL algo.graph.load('my-graph','Label','REL_TYPE',{graph:'heavy',..other config...})
  YIELD name, graph, direction, undirected, sorted, nodes, loadMillis, alreadyLoaded,
        nodeWeight, relationshipWeight, nodeProperty, loadNodes, loadRelationships;

// Info on loaded graph
CALL algo.graph.info('my-graph')
  YIELD name, type, exists, removed, nodes;

// Use graph
CALL algo.pageRank(null,null,{graph:'my-graph',...})


// Remove graph
CALL algo.graph.remove('my-graph')
  YIELD name, type, exists, removed, nodes;

Building Locally

Currently aiming at Neo4j 3.x (with a branch per version)

git clone https://github.com/neo4j-contrib/neo4j-graph-algorithms
cd neo4j-graph-algorithms
git checkout 3.3
mvn clean install
cp algo/target/graph-algorithms-*.jar $NEO4J_HOME/plugins/
$NEO4J_HOME/bin/neo4j restart

KDD-OpenSource / neo4j-graph-algorithms