Welcome to Nebula Algorithm

English | 中文

nebula-algorithm is a Spark Application based on GraphX with the following Algorithm provided for now:

Name	Use Case
PageRank	page ranking, important node digging
Louvain	community digging, hierarchical clustering
KCore	community detection, financial risk control
LabelPropagation	community detection, consultation propagation, advertising recommendation
Hanp	community detection, consultation propagation
ConnectedComponent	community detection, isolated island detection
StronglyConnectedComponent	community detection
ShortestPath	path plan, network plan
TriangleCount	network structure analysis
GraphTriangleCount	network structure and tightness analysis
BetweennessCentrality	important node digging, node influence calculation
ClosenessCentrality	important node digging, node influence calculation
DegreeStatic	graph structure analysis
ClusteringCoefficient	recommended, telecom fraud analysis
Jaccard	similarity calculation, recommendation
BFS	sequence traversal, Shortest path plan
DFS	sequence traversal, Shortest path plan
Node2Vec	graph machine learning, recommendation

You could submit the entire spark application or invoke algorithms in lib library to apply graph algorithms for DataFrame.

Get Nebula Algorithm

Build Nebula Algorithm

$ git clone https://github.com/vesoft-inc/nebula-algorithm.git
$ cd nebula-algorithm
$ mvn clean package -Dgpg.skip -Dmaven.javadoc.skip=true -Dmaven.test.skip=true

After the above buiding process, the target file nebula-algorithm-3.0-SNAPSHOT.jar will be placed under nebula-algorithm/target.

Download from Maven repo

Alternatively, it could be downloaded from the following Maven repo:

https://repo1.maven.org/maven2/com/vesoft/nebula-algorithm/

Use Nebula Algorithm

Option 1: Submit nebula-algorithm package

Configuration

Refer to the configuration example.

Submit Spark Application

${SPARK_HOME}/bin/spark-submit --master <mode> --class com.vesoft.nebula.algorithm.Main nebula-algorithm-3.0—SNAPSHOT.jar -p application.conf

Option2: Call nebula-algorithm interface

Now there are 10+ algorithms provided in lib from nebula-algorithm, which could be invoked in a programming fashion as below:
- Add dependencies in pom.xml.
```
 <dependency>
      <groupId>com.vesoft</groupId>
      <artifactId>nebula-algorithm</artifactId>
      <version>3.0.0</version>
 </dependency>
```
- Instantiate algorithm's config, below is an example for PageRank.
```
import com.vesoft.nebula.algorithm.config.{Configs, PRConfig, SparkConfig}
import org.apache.spark.sql.{DataFrame, SparkSession}

val spark = SparkSession.builder().master("local").getOrCreate()
val data  = spark.read.option("header", true).csv("src/test/resources/edge.csv")
val prConfig = new PRConfig(5, 1.0)
val prResult = PageRankAlgo.apply(spark, data, prConfig, false)
```
If your vertex ids are Strings, please set the algo config with encodeId = true. see examples

For examples of other algorithms, see examples

Note: The first column of DataFrame in the application represents the source vertices, the second represents the target vertices and the third represents edges' weight.

Sink to NebulaGraph

If you want to write the algorithm execution result into NebulaGraph(sink: nebula), make sure there is corresponding property name in your tag defination.

Algorithm	property name	property type
pagerank	pagerank	double/string
louvain	louvain	int/string
kcore	kcore	int/string
labelpropagation	lpa	int/string
connectedcomponent	cc	int/string
stronglyconnectedcomponent	scc	int/string
betweenness	betweenness	double/string
shortestpath	shortestpath	string
degreestatic	degree,inDegree,outDegree	int/string
trianglecount	trianglecount	int/string
clusteringcoefficient	clustercoefficient	double/string
closeness	closeness	double/string
hanp	hanp	int/string
bfs	bfs	string
bfs	dfs	string
jaccard	jaccard	string
node2vec	node2vec	string

Version Compatibility Matrix

NebulaGraph Algorithm Version	NebulaGraph Version	Spark Version
2.0.0	2.0.0, 2.0.1	2.4
2.1.0	2.0.0, 2.0.1	2.4
2.5.0	2.5.0, 2.5.1	2.4
2.6.0	2.6.0, 2.6.1	2.4
2.6.1	2.6.0, 2.6.1	2.4
2.6.2	2.6.0, 2.6.1	2.4
3.0.0, 3.1.x	3.0.x, 3.1.x, 3.2.x, 3.3.x	2.4
3.0-SNAPSHOT	nightly	2.4

Contribute

Nebula Algorithm is open source, you are more than welcomed to contribute in the following ways:

Discuss in the community via the forum or raise issues here.
Compose or improve our documents.
Pull Request to help improve the code itself here.

About

Nebula-Algorithm is a Spark Application based on GraphX, which enables state of art Graph Algorithms to run on top of NebulaGraph and write back results to NebulaGraph.

Languages

Language:Scala 100.0%