analytics-vidhya-competition bag-of-words data-analysis logistic-regression randomforest tf-idf twitter-sentiment-analysis word2vec xgboost

Twitter-sentiment-analysis

The objective of this task is to detect hate speech in tweets. For the sake of simplicity, we say a tweet contains hate speech if it has a racist or sexist sentiment associated with it. So, the task is to classify racist or sexist tweets from other tweets. Formally, given a training sample of tweets and labels, where label '1' denotes the tweet is racist/sexist and label '0' denotes the tweet is not racist/sexist, your objective is to predict the labels on the test dataset.

Data Files

train.csv - For training the models, we provide a labelled dataset of 31,962 tweets. The dataset is provided in the form of a csv file with each line storing a tweet id, its label and the tweet. There is 1 test file (public)

test_tweets.csv - The test data file contains only tweet ids and the tweet text with each tweet in a new line.

About

Data Analytics Project on Twitter Sentiment Analysis. Have used Logistic Regression, Random Forest, and XGBoost Models on Bag-of-Words, TF-IDF, and Word2Vec features

analytics-vidhya-competition bag-of-words data-analysis logistic-regression randomforest tf-idf twitter-sentiment-analysis word2vec xgboost

Languages

Language:Jupyter Notebook 100.0%