sohomghosh/AV_MLWARE1_sarcasm_detect_in_tweets

#Solutions to MLWARE1 MLWARE1 is a Text Analytics Hackathon organised by Analytics Vidhya from 23-25th Feb 2017 (https://datahack.analyticsvidhya.com/contest/mlware-1/). The task for this competition is to build a machine learning model that, given a tweet, can classify it correctly as sarcastic or non-sarcastic.

##Dataset: Two files - one each for training and testing are provided.
training.csv - This file contains three columns :-
•ID - ID for each tweet
•tweet - contains the text of the tweet
•label - the label for the tweet (‘sarcastic’ or ‘non-sarcastic’)
test.csv - This file has two columns containing the ID and tweets. The predictions on this set would be judged.

##Evaluation: The metric used for evaluating the predictions for this problem is simply the F1-score.

##Approach ###Feature Extraction

TF-IDF
Spelling correction followed by Word2vec
Number of occurrence of positive & negative words
Presence of specific/frequent hash tags like #sarcasm, #not etc.

*Other cleaning steps or features that could have been extracted (but not done presently) :-
a) Keep only english words [Cleaning]
b) Stemming or Lemmatization [Cleaning]
c) GloVe
d) Swivel
e) Lda2vec
f) CBOW
g) N-grams: Binary feature dictionary
h) Capitalizations
i) Topic Modeling (LDA)
j) Hidden Markov
h) Presence of contrasting positive & negative sense in the sentence
i) Entity detect
j) Sentence length, Number of co-occurrence of "!", "?", quotes("") etc. in a sentence
###Modeling

Rule based classifier based on statistics: If #sarcasm, #not etc. present then sarcastic
GBM in h20
Random Forest in h20

*Other modeling steps that could have been done but not done presently :-
a) SVM
b) XGBOOST
c) Deep Neural Network in h20
d) Recurrent Neural Networks
e) Other Deep Learning approaches, defining own cost functions & optimizing it etc.
###Ensembling of outputs The outputs of rule based classifier and GBM has been ensembled. ###References

Aditya Joshi,Pushpak Bhattacharyya, Mark J Carman Automatic Sarcasm Detection: A Survey https://arxiv.org/abs/1602.03426
Chun-Che Peng, Mohammad Lakis, Jan Wei Pan, Detecting Sarcasm in Text: An Obvious Solution to a Trivial Problem http://cs229.stanford.edu/proj2015/044_report.pdf
Roberto González-Ibáñez, Smaranda Muresan, Nina Wacholder, Identifying Sarcasm in Twitter: A Closer Look http://www.aclweb.org/anthology/P11-2102
Dmitry Davidov, Oren Tsur, Ari Rappoport, Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon https://www.aclweb.org/anthology/W/W10/W10-2914.pdf
Dylan Drover, Sarcasm Detection in Product Reviews using Sentence Scale Sentiment Change with Recurrent Neural Networks http://dylandrover.com/sarcasm_project.pdf
Piyoros Tungthamthiti, Kiyoaki Shirai, Masnizah Mohd, Recognition of Sarcasm in Tweets Based on Concept Level Sentiment Analysis and Supervised Learning Approaches http://www.aclweb.org/anthology/Y14-1047
Ellen Riloff, Ashequl Qadir, Prafulla Surve, Lalindra De Silva,Nathan Gilbert, Ruihong Huang, Sarcasm as Contrast between a Positive Sentiment and Negative Situation http://www.anthology.aclweb.org/D/D13/D13-1066.pdf
Aniruddha Ghosh et. al, Sentiment Analysis of Figurative Language in Twitter http://alt.qcri.org/semeval2015/cdrom/pdf/SemEval080.pdf
Aniruddha Ghosh, Tony Veale, Fracking Sarcasm using Neural Network https://www.aclweb.org/anthology/W/W16/W16-0425.pdf
Silvio Amir Byron C. Wallacey Hao Lyuy Paula Carvalho Mario J. Silva, Modelling Context with User Embeddings for Sarcasm Detection in Social Media https://arxiv.org/abs/1607.00976
Aditya Joshi, Vaibhav Tripathi, Kevin Patel, Pushpak Bhattacharyya, Mark Carman, Are Word Embedding-based Features Useful for Sarcasm Detection? https://www.aclweb.org/anthology/D/D16/D16-1104.pdf
Meishan Zhang, Yue Zhang and Guohong Fu, Tweet Sarcasm Detection Using Deep Neural Network https://www.aclweb.org/anthology/D/D16/D16-1104.pdf

sohomghosh / AV_MLWARE1_sarcasm_detect_in_tweets

About

Languages