Spam-Filter

In machine learning, na?ve Bayes classifiers are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong (na?ve) independence assumptions between the features. They are among the simplest Bayesian network models.

Na?ve Bayes has been studied extensively since the 1960s. It was introduced (though not under that name) into the text retrieval community in the early 1960s, and remains a popular (baseline) method for text categorization, the problem of judging documents as belonging to one category or the other (such as spam or legitimate, sports or politics, etc.) with word frequencies as the features. With appropriate pre-processing, it is competitive in this domain with more advanced methods including support vector machines.

It also finds application in automatic medical diagnosis.Na?ve Bayes classifiers are highly scalable, requiring a number of parameters linear in the number of variables (features/predictors) in a learning problem. Maximum-likelihood training can be done by evaluating a closed-form expression, which takes linear time, rather than by expensive iterative approximation as used for many other types of classifiers. In the statistics and computer science literature, naive Bayes models are known under a variety of names, including simple Bayes and independence Bayes.

About

💬 A spam-ham filter using NLTK and Multinomial Naive Bayes classifier

spam-filtering naive-bayes nlp

Languages

Language:Jupyter Notebook 100.0%