Labelling-rules-influence-on-Multinomial-Naive-Bayes-classifier-SPAM-noSPAM

Explore how different strategies affect the performance of a machine learning model by simulating the process of having different labelers label the data This, by defining a set of rules and performing automatic labeling based on those rules.

The main objective of this lab is to compare performance across labeling options to understand the role that good labeling plays on the performance of Machine Learning models, these options are:

Randomly generated labels (performance lower bound)
Automatic generated labels based on three different label strategies
True labels (performance upper bound) Case study is performed using a dataset containing comments from the 2015 top 5 most popular Youtube videos. Each comment has been labeled as spam or not_spam depending on its contents.

About

Explore how different strategies affect the performance of a machine learning model by simulating the process of having different labelers label the data

Languages

Language:Jupyter Notebook 100.0%