daniel-was-taken / Fake-News-Detection

Fake news detection using TF-IDF vectorization and LinearSVC

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fake News Detection

Given a dataset containing textual news articles or headlines, the goal is to classify each article as either "Fake" or "Authentic." Fake news is typically defined as news that contains false or intentionally misleading information, while real news contains accurate and factual information. The challenge is to develop a model that can effectively distinguish between fake and real news articles.

Working Implementation

Demo_Fake-News-Detection.mp4

Proposed Solution

A proposed solution for detecting fake news is a Python-based machine learning model that uses a dataset of news articles and performs preprocessing, vectorization, and training to classify the articles as real or fake. The model uses Linear Support Vector Classification (Linear SVC) algorithm and has shown high accuracy in detecting fake news. Exploratory Data Analysis will also be performed on the dataset. We create a pipeline that combines TF-IDF vectorization and LinearSVC.

To Test

Installation

  1. Create a virtual environment.

    • In this project we use the virtualenv package which can be installed by running pip install virtualenv in the terminal.
    • Create a virtual environment by running python -m virtualenv venv.
    • Activate the virtual environment by running venv\Scripts\activate on Windows.
  2. Install the required packages.

    • The packages can be installed by running pip install -r requirements.txt.
    • This should install the necessary packages, however, some packages could be deprecated.
  3. Run the cells within "prerequisites.ipynb"

  4. In the terminal: streamlit run analysis.py (Will take some time to run).

    • hosted_analysis.py: Does not make use of PySpark
    • analysis.py: Makes use of PySpark

Reference

About

Fake news detection using TF-IDF vectorization and LinearSVC

License:Apache License 2.0


Languages

Language:Python 78.7%Language:Jupyter Notebook 21.3%