maysaa / FakeNewsDetection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FakeNewsDetection

Arabic Fake News corpora:
The following are two Arabic corpora for the task of fake news detection:

  1. Manual Annotated Corpus:

The annotation process resulted in a corpus containing 1,537 tweets (835 fake and 702 genuine), after excluding duplicated tweets, tweets that contain mixed fake and genuine news, and tweets where the fake news was meant as sarcasm. Statistical information about the manually annotated corpus is shown in the following table:

Fake Tweets Not Fake Tweets
Total Tweets 835 702
Total Words 20,395 19,852
Unique Words 6,246 7,115
Total Characters 117,630 113,121
  1. Automatic Annotated Corpus:
    We trained different machine learning classifiers on the manually annotated corpus and used the best performing classifier to automatically predict the fake news classes of remaining unlabeled tweets. The outcome of the prediction process is 34,529 tweets (19,582 fake and 19,582 genuine) as shown in the following table.
Fake Tweets Not Fake Tweets
Total Tweets 19,582 14,947
Total Words 479,349 463,768
Unique Words 79,383 88,037
Total Characters 2,855,454 2,680,067

Machine Learning Classifiers:
Six machine learning classifiers were used to perform fake news classification for both datasets: Naïve Bayes [19], Logistic Regression (LR), Support Vector Machine (SVM), Multilayer Perceptron (MLP), Random Forest Bagging Model (RF), and eXtreme Gradient Boosting Model (XGB). The following are the hyper-parameters used with each classifier:
• NB: alpha=0.5
• LR: with default values
• SVM: c=1.0, kernel=linear, gamma=3
• MLP: activation function=ReLU, maximum iterations=30, learning rate=0.1
• RF: with default values
• XGB: with default values

About

License:MIT License


Languages

Language:Python 100.0%