GraphDetec / OS-GNN

Over-Sampling Strategy in Feature Space for Graphs based Class-imbalanced Bot Detection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OS-GNN

Over-Sampling Strategy in Feature Space for Graphs based Class-imbalanced Bot Detection

Environment Settings

  • python == 3.7
  • torch == 1.8.1+cu102
  • numpy == 1.21.6
  • scipy == 1.7.2
  • pandas == 1.3.5
  • scikit-learn == 1.0.2
  • torch-cluster == 1.5.9
  • torch-geometric == 2.0.4
  • torch-scatter == 2.0.8
  • torch-sparse == 0.6.12
  • torch-spline-conv == 1.2.1

Usage

Run models on bot detection datasets

python OS-GNN.py -dataset dataset -model model -smote smote
  • dataset: including [MGTAB, Twibot20, Cresci15].
  • model: including ['GCN', 'GAT', 'SAGE', 'RGCN'].
  • smote: including [True, False].

e.g.

python OS-GNN.py -dataset MGTAB -model GCN -smote False
python OS-GNN.py -dataset MGTAB -model GCN -smote True
python OS-GNN.py -dataset Twibot20 -model GAT -smote True
python OS-GNN.py -dataset Cresci15 -model RGCN -smote True

Run models on subgraph

(different imbalanced ratio)

python subgraph-OS-GNN.py -dataset dataset -model model -smote smote -ratio ratio
  • ratio: in the interval [0, 1].

e.g.

python subgraph-OS-GNN.py -dataset MGTAB -model GCN -smote False -ratio 0.05
python subgraph-OS-GNN.py -dataset Twibot20 -model GAT -smote False -ratio 0.20

Run reweighting method

python GNN-reweight.py -dataset dataset -model model -reweight reweight -gamma gamma
  • reweight: including [CB, FL].
  • smote: including [True, False].
  • beta: parameter for CB loss. (default = 0.9999)
  • gamma: parameter for reweight. (default = 2.0)
  • alpha: parameter for FocaL loss. (default = 0.5)

e.g.

python GNN-reweight.py -dataset MGTAB -model GCN -reweight CB --beta 0.99
python GNN-reweight.py -dataset MGTAB -model GCN -reweight FL --alpha 0.4
python GNN-reweight.py -dataset Twibot20 -model GCN -reweight FL --alpha 0.8
python GNN-reweight.py -dataset Cresci15 -model GCN -reweight FL --alpha 0.6

Results

GCN

Dataset Accuracy F1-macro Balanced accuracy
TwiBot-20 68.76
$_{0.60}$
68.30
$_{0.51}$
68.29
$_{0.62}$
Cresci-15 96.50
$_{0.36}$
96.20
$_{0.42}$
95.95
$_{0.53}$
MGTAB 82.69
$_{0.76}$
74.85
$_{1.32}$
72.32
$_{1.29}$

OS-GNN (backbone GCN)

Dataset Accuracy F1-macro Balanced accuracy
TwiBot-20 83.44
$_{0.40}$
83.18
$_{0.35}$
83.12
$_{0.24}$
Cresci-15 96.73
$_{0.30}$
96.46
$_{0.18}$
96.43
$_{0.19}$
MGTAB 85.84
$_{0.92}$
83.27
$_{0.80}$
85.81
$_{0.33}$

GAT

Dataset Accuracy F1-macro Balanced accuracy
TwiBot-20 72.80
$_{0.11}$
72.31
$_{0.27}$
71.57
$_{0.88}$
Cresci-15 96.49
$_{0.15}$
96.18
$_{0.30}$
95.86
$_{0.39}$
MGTAB 84.46
$_{1.13}$
80.47
$_{1.29}$
79.35
$_{1.58}$

OS-GNN (backbone GAT)

Dataset Accuracy F1-macro Balanced accuracy
TwiBot-20 82.49
$_{0.42}$
82.30
$_{0.37}$
82.41
$_{0.25}$
Cresci-15 96.65
$_{0.36}$
96.38
$_{0.39}$
96.35
$_{0.40}$
MGTAB 86.75
$_{0.74}$
85.39
$_{0.71}$
87.18
$_{0.50}$

Dataset

For TwiBot-20, please visit the Twibot-20 github repository. For MGTAB please visit the MGTAB github repository. For Cresci-15 please visit the Twibot-20 github repository.

We also offer the processed data set: Cresci-15, MGTAB, Twibot-20.

About

Over-Sampling Strategy in Feature Space for Graphs based Class-imbalanced Bot Detection

License:MIT License


Languages

Language:Python 100.0%