abuse-detection toxic-detection insult-detection threat-detection logistic-regression tfidf

toxic-comment-classifier

This project is to classify toxic and abusive comments from huge bunch of text.
I have gave training for these 6 type of toxic comments :

toxic
severe_toxic
obscene
threat
insult
identity_hate

It predicts probability of toxicity for above defined classes.
A comment may be classified in one or more classes. As for example a comment may be "insulting" to someone but may not be "threatning".

Download train/test data from here : https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/data

Data cleaning is also done before this training and testing. I only have kept alphabetic latters and some punctuation marks. No numbers or any other sysmbols.

We can do this task on any given chunk of data. We can pick the highest probability class to choose the type of toxicity.

Have a look at https://github.com/prashant-kikani/toxic-comment-classifier/blob/master/test.ipynb to get the clear idea.

About

To classify toxic and abusive comments from huge bunch of text.

abuse-detection toxic-detection insult-detection threat-detection logistic-regression tfidf

Languages

Language:Jupyter Notebook 84.9%Language:Python 15.1%