Over-Sampling Strategy in Feature Space for Graphs based Class-imbalanced Bot Detection
- python == 3.7
- torch == 1.8.1+cu102
- numpy == 1.21.6
- scipy == 1.7.2
- pandas == 1.3.5
- scikit-learn == 1.0.2
- torch-cluster == 1.5.9
- torch-geometric == 2.0.4
- torch-scatter == 2.0.8
- torch-sparse == 0.6.12
- torch-spline-conv == 1.2.1
python OS-GNN.py -dataset dataset -model model -smote smote
- dataset: including [MGTAB, Twibot20, Cresci15].
- model: including ['GCN', 'GAT', 'SAGE', 'RGCN'].
- smote: including [True, False].
e.g.
python OS-GNN.py -dataset MGTAB -model GCN -smote False
python OS-GNN.py -dataset MGTAB -model GCN -smote True
python OS-GNN.py -dataset Twibot20 -model GAT -smote True
python OS-GNN.py -dataset Cresci15 -model RGCN -smote True
(different imbalanced ratio)
python subgraph-OS-GNN.py -dataset dataset -model model -smote smote -ratio ratio
- ratio: in the interval [0, 1].
e.g.
python subgraph-OS-GNN.py -dataset MGTAB -model GCN -smote False -ratio 0.05
python subgraph-OS-GNN.py -dataset Twibot20 -model GAT -smote False -ratio 0.20
python GNN-reweight.py -dataset dataset -model model -reweight reweight -gamma gamma
- reweight: including [CB, FL].
- smote: including [True, False].
- beta: parameter for CB loss. (default = 0.9999)
- gamma: parameter for reweight. (default = 2.0)
- alpha: parameter for FocaL loss. (default = 0.5)
e.g.
python GNN-reweight.py -dataset MGTAB -model GCN -reweight CB --beta 0.99
python GNN-reweight.py -dataset MGTAB -model GCN -reweight FL --alpha 0.4
python GNN-reweight.py -dataset Twibot20 -model GCN -reweight FL --alpha 0.8
python GNN-reweight.py -dataset Cresci15 -model GCN -reweight FL --alpha 0.6
GCN
Dataset | Accuracy | F1-macro | Balanced accuracy |
---|---|---|---|
TwiBot-20 | 68.76 |
68.30 |
68.29 |
Cresci-15 | 96.50 |
96.20 |
95.95 |
MGTAB | 82.69 |
74.85 |
72.32 |
OS-GNN (backbone GCN)
Dataset | Accuracy | F1-macro | Balanced accuracy |
---|---|---|---|
TwiBot-20 | 83.44 |
83.18 |
83.12 |
Cresci-15 | 96.73 |
96.46 |
96.43 |
MGTAB | 85.84 |
83.27 |
85.81 |
GAT
Dataset | Accuracy | F1-macro | Balanced accuracy |
---|---|---|---|
TwiBot-20 | 72.80 |
72.31 |
71.57 |
Cresci-15 | 96.49 |
96.18 |
95.86 |
MGTAB | 84.46 |
80.47 |
79.35 |
OS-GNN (backbone GAT)
Dataset | Accuracy | F1-macro | Balanced accuracy |
---|---|---|---|
TwiBot-20 | 82.49 |
82.30 |
82.41 |
Cresci-15 | 96.65 |
96.38 |
96.35 |
MGTAB | 86.75 |
85.39 |
87.18 |
For TwiBot-20, please visit the Twibot-20 github repository. For MGTAB please visit the MGTAB github repository. For Cresci-15 please visit the Twibot-20 github repository.
We also offer the processed data set: Cresci-15, MGTAB, Twibot-20.