mmmeeedddsss / sarcasm_detection

Ceng790 Term Project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sarcasm Detection using Spark

Mert Tunç, Egemen Berk Galatalı


1.3 million reddit comments that is labeled as sarcastic or not is used as dataset to create a sarcasm classifier. Several methods for preprocessing, feature extraction and ml models are combined to get the best results. Implementation is done using Scala with Spark.

77% accurcacy on test set is taken with the best preprocessing - feature extraction - ml model selection - semi optimized hyper parameters combination. Please note that no other coloumns than comments itselves and labels are used on training or testing.

About

Ceng790 Term Project


Languages

Language:Scala 100.0%