python3 machine-learning naive-bayes-classification multilabel-classification matplotlib svm-classifier toxic-comment-classification

Toxic comment classification.

Problem Statement

Everyday while surfing the social media we encounter a lot of comments, reviews, tweets etc. that we believe might hurt the sentiments of the people of a particular group or a community. These comments are believed to be toxic in nature, which thus defines the problem that we are trying to solve with this project i.e Classifying the comments on the social media into various categories of toxicity, which are - Toxic, Severe-toxic, Obscene, Threat, Insult, Identity_hate. This is a Multi Label Classification problem which means that a given comment may belong to more than one category at the same time.

Language and Libraries used.

Python 3.7
Numpy
Pandas
Matplotlib
NLTK
Seaborn

Steps involved

Getting the dataset
Getting insights from dataset using visualisation tools.
Preprocessing the data using NLTK.
Applying Multi Label classification algorithms.
Comparing the results and choosing the best among them.

Results

Predicted an accuracy score of 88.16% using Binary Relevance method with SVM classifier.

About

Multilabel classification of comments based on their toxicity.

python3 machine-learning naive-bayes-classification multilabel-classification matplotlib svm-classifier toxic-comment-classification

Languages

Language:Jupyter Notebook 100.0%