joneswong / STRAP-git

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scalable Graph Embeddings via Sparse Transpose Proximities

Tested Environment

  • Ubuntu
  • C++ 11
  • GCC 4.8
  • Intel C++ Compiler
  • Boost (optional, only link prediction requires this)

Preparation

Place the prepared data [file].txt in the NR_Dataset. Note that the first row of data is the node size and each row is the information of each edge [outNode] [inNode].

Besides, directed graph and undirected graph should be distinguished.

Datasets used in our paper are provided in NR_Dataset.

Data Set Directed N M
BlogCatalog No 10312 333983
Flickr No 80513 5899882
YouTube No 1138499 2990443
WikiVote Yes 7115 103689
Slashdot Yes 82168 870161
Euro No 399 5993
Brazil No 131 1003

Compilations

bash compile.sh

Move the files in the frPCA folder to the root directory before compiling.

Usage

We provide two versions of the code to ensure reproducibility.

STRAP based on SVD

We write a SVD version based on Eigen 3.x.

./STRAP_SVD_U <graph_name> <data_path> <emb_path> <alpha> <iteration> <error> <threads>

Parameters

  • graph_name: name of target graph
  • data_path: path to load source file
  • emb_path: path to save embedding files
  • alpha: parameter for PPR
  • iteration: parameter for SVD
  • error: parameter for Backward Push
  • threads

Examples

For undirected graph:

./STRAP_SVD_U BlogCatalog-u NR_Dataset/ NR_EB/ 0.5 12 0.00001 24

For directed graph:

./STRAP_SVD_D wikivote NR_Dataset/ NR_EB/ 0.5 12 0.00001 24

STRAP based on frPCA

Results in our paper are all based on this version.

In this version we make use of frPCA to get better performance.

./STRAP_FRPCA_U <graph_name> <data_path> <emb_path> <alpha> <iteration> <error> <threads>

Parameters

  • iteration: parameter for frPCA

  • others are the same as above

Examples

For undirected graph:

./STRAP_FRPCA_U BlogCatalog-u NR_Dataset/ NR_EB/ 0.5 12 0.00001 24

For directed graph:

./STRAP_FRPCA_D wikivote NR_Dataset/ NR_EB/ 0.5 12 0.00001 24

Experiments

Graph Reconstruction

Train the embeddings of a full graph and then reconstruct it. The code to calculate reconstruction precision is provided:

./NET_RE_U BlogCatalog-u strap_frpca_u
./NET_RE_D wikivote strap_frpca_d

For big graphs, like YouTube, we sample a subgraph to do reconstruction.

Link Prediction

First, split the graph into training/testing set and generate negative samples. Datasets will be saved into LP_Dataset separately. The ratio of testing part can be assigned:

./GEN_DATA_U BlogCatalog-u 0.5
./GEN_DATA_D wikivote 0.5

Then get embeddings of the training set. Predict missing edges via score $s_u \cdot t_v$. The code to calculate link prediction precision is provided:

./LINK_PRE_U BlogCatalog-u strap_frpca_u
./LINK_PRE_D wikivote strap_frpca_d

Node Classification

Generate a classifier using the embeddings of full graph, the provided labels and the training set. The performance is evaluated in terms of average Micro-F1 and average Macro-F1. This part is implemented in Python 3.4 and sklearn 0.20.1:

python labelclassification.py BlogCatalog-u strap_frpca_u

Citing

Please cite our paper if you choose to use our code.

@inproceedings{10.1145/3292500.3330860,
author = {Yin, Yuan and Wei, Zhewei},
title = {Scalable Graph Embeddings via Sparse Transpose Proximities},
year = {2019},
isbn = {9781450362016},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3292500.3330860},
doi = {10.1145/3292500.3330860},
booktitle = {Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining},
pages = {1429–1437},
numpages = {9},
keywords = {network representation learning, personalized pagerank, graph embedding},
location = {Anchorage, AK, USA},
series = {KDD ’19}
}

About


Languages

Language:C++ 52.4%Language:C 44.5%Language:Python 2.5%Language:Shell 0.7%