glove kaggle-competition keras sentiment-analysis text-analysis text-classification word2vec xgboost

Toxic_Comment_Classification

My Trial to tackle the Kaggle Toxic Comment Classification Competition

I built a model that calculates the probability of a comment belonging to any of the mentioned classes, I used XGBoost after generating feature vectors using GLove and Google news Word2Vec

I got a total AUC of 0.82

Resources needed:

Download data from kaggle competition page here
Download GLove Word Vectors here, choose the 300d.480B model
Download GoogleNews Word Vectors here
To use the Keras model built in the file example_to_clarify.py, you need to download the 20 Newsgroup dataset

Note:

final_try.py

file is an implementation to XGBoost algorithm on the same data

To Do::

You definetly can make much more hyperparameter optimization epecially regarding the LSTM model. for example: You can try playing around with max_features, max_len, Droupout_rate,size of the Dense layer, etc...
You can try differnt feature engineering and normaization techniques for the text data
In general try playing around with parameters like batch_size, num_epochs and learning_rate
Try to use differnt optimization function, maybe Adagrad ,Adadelta or sgd

About

My Trial to tackle the Kaggle Toxic Comment Classification Competition

glove kaggle-competition keras sentiment-analysis text-analysis text-classification word2vec xgboost

MIT License

Languages

Language:Jupyter Notebook 99.8%Language:Python 0.2%