lhbrichard / rumor-detection

The public code for paper A Graph Convolutional Encoder and Decoder Model for Rumor Detection which is accepted by DSAA 2020

Repository from Github https://github.comlhbrichard/rumor-detectionRepository from Github https://github.comlhbrichard/rumor-detection


The public code for paper A Graph Convolutional Encoder and Decoder Model for Rumor Detection which is accepted by DSAA 2020

Table of Contents

  • data
    After decompress data.rar, you can get three folds named Twitter15,Twitter16, Weibo. Each directory contains two types of file: feature file and label file.
    For feature file, it's a delimited file using '\t', which includes information such as 'eid', 'indexP', 'indexC', 'max_degree', 'maxL' and 'Vec'.

      eid: root id
      indexP: index of parent
      indexC: index of current
      max_degree: the total number of the parent node in the tree
      maxL: the maximum length of all the texts from the tree
      Vec: list of index and count

For label file, every root id corresponds a label.

  • Process
    • getTwittergraph.py
      To deal with feature file and record the relationship between each node. Meanwhile, store the feature matrix of each node. Finally save all the information into file as '.npy' format.
    • getWeibograph.py
      Done same operation as getTwittergraph.py
    • rand5fold.py
      To deal with label file and generate 5-fold lists for valid-set and train-set.
    • process.py
      To define an own PyG graph dataset to get batchsize of data.
  • tools
    • earlystopping.py
      In the experiment, we set patience equal to 10, that means when the score doesn't improve for 10 iterations, we will early stop training and save the model result.
    • earlystopping2class.py
      Done same operation as earlystopping.py but for Weibo dataset.
    • evaluate.py
      Define some criteria like accuracy and F1 score.
  • model
    • GAE.py   Our base model using GAE as Decoder Module
    • VGAE.py   Our base model using VGAE as Decoder Module
    • only_gcn.py   Comparative trial
    • MVAE.py   Comparative trial
    • add_root_info.py   Trick to enhance better representation of data
    • base_BU.py   Reverse the data flow
    • bidirect.py   Try to use two directions of data flow
    • Model_Twitter.py   Main function to run on Twitter
    • Model_Weibo.py   Main function to run on Weibo


We implement our models using the same set of hyper parameters in our experiments. The batch size is 128. The hidden dim is 64. The total process is iterated upon 50 epochs. The learning rate is 5e-4. We randomly split the datasets and conduct a 5-fold cross-validation and use acc. and f1 as criteria.

Quick start

Step1: Prepare Data

After decompress data.rar, using command

python getTwittergraph.py

Step2: Train Model

With two arguments, first stands for dataset's name, the latter is the name of the model ('GCN','GAE','VGAE' can be chosen)

python Model_Twitter.py Twitter15 VGAE


Here we only show part of result in the experiment, more details can be seen in the paper.

model_name \ acc. Twitter Weibo
baseline 0.737 0.908
only GCN 0.840 0.935
AE-GCN 0.851 0.942
VAE-GCN 0.856 0.944

Except the main experiment, we also try some tricks to improve model, however we get the worse effect.

model_name result
only GCN 0.8396
one-layer GCN 0.8498
two-layers GCN 0.8367
GAT 0.7879
GCN add root 0.7374
bidirect 0.8294
GAE 0.8498
Bottom-up direction GAE 0.3535


The public code for paper A Graph Convolutional Encoder and Decoder Model for Rumor Detection which is accepted by DSAA 2020


Language:Python 100.0%