MartinXPN / abcde

ABCDE: Approximating Betweenness-Centrality ranking with progressive-DropEdge

Home Page:https://peerj.com/articles/cs-699/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ABCDE

ABCDE: Approximating Betweenness-Centrality ranking with progressive-DropEdge

This work was published in PeerJ Computer Science journal: https://peerj.com/articles/cs-699/

Link to the Overleaf project (PeerJ draft): https://www.overleaf.com/read/tphdqhycvwfk

ABCDE model architecture

Each Transition block is a set of {Linear → LayerNorm → PRelu → Dropout} layers, while each GCN is a set of {GCNConv → PReLU → LayerNorm → Dropout}. + symbol is the concatenation operation. Each MaxPooling operation extracts the maximum value from the given GCN block.

To reproduce the results

# This will run the ABCDE model on both real-world and synthetic datasets and report the results
docker run martin97/abcde:latest

To run the training script

# Modify the hyperparameters in `train.py` and run it
python train.py

To evaluate the model

Download the evaluation data

cd datasets
./download.sh

*Optionally download the model reported in the paper

mkdir models && cd models
wget https://github.com/MartinXPN/abcde/releases/download/v1.0.0/best.ckpt

Then run the prediction script on the target dataset

# For a specific dataset
python predict.py real --model_path experiments/latest/models/best.ckpt \
                       --data_test datasets/real/amazon.txt \
                       --label_file datasets/real/amazon_score.txt

# To run the whole evaluation
python predict.py all --model_path experiments/latest/models/best.h5py --datasets_dir datasets

Obtained results so far v1.0.0 and comparison with the original DrBC paper and sampling based benchmarks

Obtained results on Real datasets (Model was run on 512GB CPU machine with 80 cores):

Dataset ABRA RK KADABRA Node2Vec DrBC ABCDE
Top-1%
com-youtube 95.7 76.0 57.5 12.3 73.6 77.1
amazon 69.2 86.0 47.6 16.7 86.2 92.0
Dblp 49.7 NA 35.2 11.5 78.9 79.8
cit-Patents 37.0 74.4 23.4 0.04 48.3 50.2
com-lj 60.0 54.2 31.9 3.9 67.2 70.9
Top-5%
com-youtube 91.2 75.8 47.3 18.9 66.7 75.1
amazon 58.0 59.4 56.0 23.2 79.7 88.0
Dblp 45.5 NA 42.6 20.2 72.0 73.7
cit-Patents 42.4 68.2 25.1 0.29 57.5 58.3
com-lj 56.9 NA 39.5 10.35 72.6 75.7
Top-10%
com-youtube 89.5 100.0 44.6 23.6 69.5 77.6
amazon 60.3 100.0 56.7 26.6 76.9 85.6
Dblp 100.0 NA 50.4 27.7 72.5 76.3
cit-Patents 50.9 53.5 21.6 0.99 64.1 64.9
com-lj 63.6 NA 47.6 15.4 74.8 78.0
Kendall Tau
com-youtube 56.2 13.9 NA 46.2 57.3 59.8
amazon 16.3 9.7 NA 44.7 69.3 77.7
Dblp 14.3 NA NA 49.5 71.9 73.7
cit-Patents 17.3 15.3 NA 4.0 72.6 73.5
com-lj 22.8 NA NA 35.1 71.3 71.8
Time/s
com-youtube 72898.7 125651.2 116.1 4729.8 402.9 26.7
amazon 5402.3 149680.6 244.7 10679.0 449.8 63.5
Dblp 11591.5 NA 398.1 17446.9 566.7 104.9
cit-Patents 10704.6 252028.5 568.0 11729.1 744.1 163.9
com-lj 34309.6 NA 612.9 18253.6 2274.2 271.0

Obtained results on Synthetic datasets (Model was run on 512GB CPU machine with 80 cores):

Scale ABRA RK k-BC KADABRA Node2Vec DrBC ABCDE
Top-1%
5000 97.8±1.5 96.8±1.7 94.1±0.8 76.2±12.5 19.1±4.8 96.5±1.8 97.5±1.3
10000 97.2±1.2 96.4±1.3 93.3±3.1 74.6±16.5 21.2±4.3 96.7±1.2 96.9±0.9
20000 96.5±1.0 95.5±1.1 91.6±4.0 74.6±16.7 16.1±3.9 95.6±0.9 96.0±1.2
50000 94.6±0.7 93.3±0.9 90.1±4.7 73.8±14.9 9.6±1.3 92.5±1.2 93.6±0.9
100000 92.2±0.8 91.5±0.8 88.6±4.7 67.0±12.4 9.6±1.3 90.3±0.9 91.8±0.6
Top-5%
5000 96.9±0.7 95.6±0.9 89.3±3.9 68.7±13.4 23.3±3.6 95.9±0.9 97.8±0.7
10000 95.6±0.8 94.1±0.8 88.4±5.1 70.7±13.8 20.5±2.7 95.0±0.8 97.0±0.6
20000 93.9±0.8 92.2±0.9 86.9±6.2 69.1±13.5 16.9±2.0 93.0±1.1 95.2±0.8
50000 90.1±0.8 88.0±0.8 84.4±7.2 65.8±11.7 13.8±1.0 89.2±1.1 92.1±0.6
100000 85.6±1.1 87.6±0.5 82.4±7.5 57.0±9.4 12.9±1.2 86.2±0.9 89.7±0.5
Top-10%
5000 96.1±0.7 94.3±0.9 86.7±4.5 67.2±12.5 25.4±3.4 94.8±0.7 97.6±0.4
10000 94.1±0.6 92.2±0.9 86.0±5.9 67.8±13.0 25.4±3.4 94.0±0.9 96.8±0.6
20000 92.1±0.8 90.6±0.9 84.5±6.8 66.1±12.4 19.9±1.9 91.9±0.9 94.9±0.5
50000 87.4±0.9 88.2±0.5 82.1±8.0 61.3±10.4 18.0±1.2 87.9±1.0 91.7±0.6
100000 81.8±1.5 87.4±0.4 80.1±8.2 52.4±8.2 17.3±1.3 85.0±0.9 89.4±0.5
Kendall Tau
5000 86.6±1.0 78.6±0.6 66.2±11.4 NA 11.3±3.0 88.4±0.3 93.7±0.2
10000 81.6±1.2 72.3±0.6 67.2±13.5 NA 8.5±2.3 86.8±0.4 93.3±0.1
20000 76.9±1.5 65.5±1.2 67.1±14.3 NA 7.5±2.2 84.0±0.5 92.1±0.1
50000 68.2±1.3 53.3±1.4 66.2±14.1 NA 7.1±1.8 80.1±0.5 90.1±0.2
100000 60.3±1.9 44.2±0.2 64.9±13.5 NA 7.1±1.9 77.8±0.4 88.4±0.2
Time/s
5000 18.5±3.6 17.1±3.0 12.2±6.3 0.6±0.1 32.4±3.8 0.3±0.0 0.5±0.0
10000 29.2±4.8 21.0±3.6 47.2±27.3 1.0±0.2 73.1±7.0 0.6±0.0 0.6±0.0
20000 52.7±8.1 43.0±3.2 176.4±105.1 1.6±0.3 129.3±17.6 1.4±0.0 0.9±0.0
50000 168.3±23.8 131.4±2.0 935.1±505.9 3.9±1.0 263.2±46.6 3.9±0.2 2.2±0.0
100000 380.3±63.7 363.4±36.3 3069.2±1378.5 7.2±1.8 416.2±37.0 8.2±0.3 3.2±0.0

About

ABCDE: Approximating Betweenness-Centrality ranking with progressive-DropEdge

https://peerj.com/articles/cs-699/

License:MIT License


Languages

Language:Python 99.3%Language:Shell 0.7%