bert ensemble-learning gpt2 gru keras-tensorflow lstm neural-networks nlp roberta textcnn toxicity-classification transformers

DETECTING TOXIC COMMENTS AND MINIMIZING OF UNINTENTIONAL PREJUDICE USING NEURAL NETWORKS

Abstract

The internet constitutes a society and as in every society there are malicious people so there are users on the internet who victimize other members of the community by making vulgar and provocative comments. Such toxic behaviors in first phase prevent the victims from exercising their right to freedom of speech in the future and in second phase they desert the community. The purpose of this thesis is to investigate and predict the toxicity in comments using various Neural Network architectures. The data set was taken from Kaggle's ‘Jigsaw Unintended Bias in Toxicity Classification’ competition organized by Jigsaw, a Google research team. The architectures, synthesized, are 16 in total: 6 using LSTM, 6 using GRU, 1 using CNN, 1 using BERT, 1 using RoBERTa and 1 using GPT2. Finally, ensemble learning was used, testing various combinations for the 4 best architectures. The best results were shown by the use of all four best architectures ranking this solution in the top 6% of the best solutions of the competition.

About

This is my repository and all the code needed to complete my Bachelor thesis on the detection of toxic comments.

bert ensemble-learning gpt2 gru keras-tensorflow lstm neural-networks nlp roberta textcnn toxicity-classification transformers

Apache License 2.0

Languages

Language:Python 100.0%