FakeNewsDetection

Arabic Fake News corpora:
The following are two Arabic corpora for the task of fake news detection:

Manual Annotated Corpus:

The annotation process resulted in a corpus containing 1,537 tweets (835 fake and 702 genuine), after excluding duplicated tweets, tweets that contain mixed fake and genuine news, and tweets where the fake news was meant as sarcasm. Statistical information about the manually annotated corpus is shown in the following table:

	Fake Tweets	Not Fake Tweets
Total Tweets	835	702
Total Words	20,395	19,852
Unique Words	6,246	7,115
Total Characters	117,630	113,121

Automatic Annotated Corpus:
We trained different machine learning classifiers on the manually annotated corpus and used the best performing classifier to automatically predict the fake news classes of remaining unlabeled tweets. The outcome of the prediction process is 34,529 tweets (19,582 fake and 19,582 genuine) as shown in the following table.

	Fake Tweets	Not Fake Tweets
Total Tweets	19,582	14,947
Total Words	479,349	463,768
Unique Words	79,383	88,037
Total Characters	2,855,454	2,680,067

Machine Learning Classifiers:
Six machine learning classifiers were used to perform fake news classification for both datasets: Naïve Bayes [19], Logistic Regression (LR), Support Vector Machine (SVM), Multilayer Perceptron (MLP), Random Forest Bagging Model (RF), and eXtreme Gradient Boosting Model (XGB). The following are the hyper-parameters used with each classifier:
• NB: alpha=0.5
• LR: with default values
• SVM: c=1.0, kernel=linear, gamma=3
• MLP: activation function=ReLU, maximum iterations=30, learning rate=0.1
• RF: with default values
• XGB: with default values

maysaa / FakeNewsDetection

FakeNewsDetection

About

Languages