weilu / cs5234

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Densest Subgraph Algorithms

Code repo for CS5234 Combinatorial and Graph Algorithms mini-project.

Development

Data Preprocessing

Undirected graph network data comes from https://snap.stanford.edu/data/. The data comes in edge pairs, one pair per line separated by tab, with duplicates such as 1 2 and 2 1. However duplicates don't always exists, so we clean up the data by preprocessing it and removing such duplicates and comment lines that starts with #.

# produces data/*_preprocessed.txt files
python preprocess.py

Baseline Algorithm

Exact min-cut based algorithm by Goldberg implemented in python.

Setup:

# If not MacOS, see https://git.skewed.de/count0/graph-tool/wikis/installation-instructions
brew install graph-tool

# virtualenv needs to be created with --system-site-packages to access graph-tool
mkvirtualenv --python=`which python3` --system-site-packages cs5234

Run:

python densest_subgraph_goldberg.py

Distributed Algorithm

2-approximate algorithm by Bahmani, Kumar and Vassilvitskii implemented in Giraph using Java.

Setup: http://giraph.apache.org/quick_start.html

Run:

cd giraph/DensestSubgraphApprox/

# make jar
mvn clean compile assembly:single

# upload test data to hdfs
$HADOOP_HOME/bin/hadoop dfs -mkdir -p /user/giraph/input/
$HADOOP_HOME/bin/hadoop dfs -copyFromLocal ../../data/*_preprocessed.txt /user/giraph/input/

# run job, change path accordingly
export HADOOP_CLASSPATH=/usr/local/giraph/giraph-core/target/giraph-1.3.0-SNAPSHOT-for-hadoop-2.8.2-jar-with-dependencies.jar:/Users/luwei/workspace/CS5234/cs5234_project/giraph/DensestSubgraphApprox/target/DensestSubgraphApprox-1.0-SNAPSHOT-jar-with-dependencies.jar
export LIBJARS=/usr/local/giraph/giraph-core/target/giraph-1.3.0-SNAPSHOT-for-hadoop-2.8.2-jar-with-dependencies.jar,/Users/luwei/workspace/CS5234/cs5234_project/giraph/DensestSubgraphApprox/target/DensestSubgraphApprox-1.0-SNAPSHOT-jar-with-dependencies.jar
hadoop jar /usr/local/giraph/giraph-core/target/giraph-1.3.0-SNAPSHOT-for-hadoop-2.8.2-jar-with-dependencies.jar \
  org.apache.giraph.GiraphRunner -libjars ${LIBJARS} weilu.App \
  -mc weilu.DensityMasterCompute -aw org.apache.giraph.aggregators.TextAggregatorWriter \
  -ca giraph.textAggregatorWriter.frequency=1 \
  -ca giraph.vertex.resolver.create.on.msgs=false \
  -ca giraph.logLevel=trace \
  -eif org.apache.giraph.io.formats.IntNullReverseTextEdgeInputFormat \
  -eip /user/giraph/input/ca-GrQc_preprocessed.txt -w 1

# see results
$HADOOP_HOME/bin/hadoop dfs -cat /user/giraph/aggregatorValues_0

About


Languages

Language:TeX 59.8%Language:Python 25.0%Language:Shell 8.2%Language:Java 6.9%