Shubha23/Fake-News-Detection-Text-Preprocessing-and-Classification

nlp-machine-learning text-classification text-processing nltk-library sklearn news tfidfvectorizer

Aim : Fake news detection by classification of real versus fake news pieces.

Dataset: Kaggle.com (Fake News Balanced dataset for fake news analysis data)

Files: Jupyter Notebook (Python), zipped dataset file.

Author : Shubha Mishra

The project accomplishes these tasks :

Text cleaning and preprocessing of fake_or_real_news dataset using NLTK and Regex library.

Creating and transforming clean text into tf-idf vectors.

Learning models like Passive Aggressive Classifier, XGBoost and LGBM to perform classification of fake and real news pieces. (A few other algorithms were also tried but only the best performers are chosen here.)

Evaluate each model's performance based on the accuracy scores and confusion matrices they produced.

Please see the notebook for details on each of these steps.

-------------------------------------------------------------- End of file ------------------------------------------------------------------------

About

Fake new detection using text classification as real or fake news segments. Required installations - Python 3.8, NLTK, Scikit-Learn, Jupyter. Text cleaning, tokenization, vectorization, classification model generation and evaluation.

nlp-machine-learning text-classification text-processing nltk-library sklearn news tfidfvectorizer

GNU General Public License v3.0

Languages

Language:Jupyter Notebook 100.0%