A basic example to specifically predict sarcastic news headlines with traditional machine models and ensembling technique.
This repo accompanies the blog post from my blog post here
Datasets can be downloaded through this link
- Text Analysis including word counts, most frequent words, wordcloud
- Build text preprocessing pipeline with bag of n-grams representation, nltk's casual_tokenizer, classifiers.
- Ensemble text preprocessing piepeline and multiple classifiers.
- Deploy model with Flask and Heroku
If you want to directly play with it, the model had been deployede to here using heroku. Caution: This model was trained with sarcastic news headline, other sarcastic text such as from tweets or conversation might suffer from low predictive accuracy.
Although this basic model may not be a state-of-art model, it is actually could be a quick and dirty baseline for most of text classification problems in the beginning. Depends on the problem at hand, we could still gradually improve this model with more sophisticated text preprocessing method or other state-of-art model.