charliezcr / Python-Bayesian-Spam-Filter

This is a spam filter implemented by using Bayes' Theorem and Python's NLTK package to perform basic text analysis

Home Page:https://charliezcr.github.io/SpamFilter.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Python Bayesian Spam Filter

Project Overview

We receive a lot of mails but our mailbox automatically sorts the spams out and only take hams (the mail that you want, opposite of spams) in our inbox. How exactly does our mailbox calcualte whether the mail is a spam or not? This is a spam filter implemented in python to showcase the use of Naive Bayes Classifier and Bag-of-Words model in the our mail box.

Contents

For a detaile walk-through of the code and explanation of the theories, please look at Python notebook or website
If you are more interested in the code itself, please read the Python file
The rest txt files are training and testing data.

Modules

pip install nltk

  • nltk: natural language processing Please also download punctuation and stopwords in nltk
nltk.download('punkt')
nltk.download('stopwords')

About

This is a spam filter implemented by using Bayes' Theorem and Python's NLTK package to perform basic text analysis

https://charliezcr.github.io/SpamFilter.html

License:MIT License


Languages

Language:Jupyter Notebook 82.3%Language:Python 17.7%