soujanyaporia / CASCADE--ContextuAl-SarCAsm-DEtector

CASCADE: Contextual Sarcasm Detection in Online Discussion Forums

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CASCADE: Contextual Sarcasm Detection in Online Discussion Forums

Code for the paper CASCADE: Contextual Sarcasm Detection in Online Discussion Forums (COLING 2018).

Description

In this paper, we propose a ContextuAl SarCasm DEtector (CASCADE), which adopts a hybrid approach of both content- and context-driven modeling for sarcasm detection in online social media discussions (Reddit).

Requirements

  1. Python (2.7 or 3)
  2. Tensorflow (1.4.0)
  3. FastText pre-trained embeddings
  4. Download and save user_gcca_embeddings.npz at ./CASCADE/users/user_embeddings/

Optional

To train user-embeddings, download the dataset file: comments.json [1] from this link and train-balanced.csv from this link and save it inside folder: ./CASCADE/data/.

Preprocessing

User Embeddings

User Embeddings: Stylometric features

The file ./CASCADE/data/comments.json has users and their corresponding tweets. Per user, there might be multiple number of tweets. Hence, we concatenate all the tweets corresponding to a user with the <END> tag:

1. cd users
2. python create_per_user_paragraph.py

The ParagraphVector algorithm is used to generate the stylometric features. First, train the model:

3. python train_stylometric.py

generate user_stylometric.csv (user stlyometric features) using the trained model:

4. python generate_stylometric.py

User Embeddings: Personality features

Pre-train a cnn-based model to detect personality features from text. The code utilizes two datasets to train. The second dataset [2] can be obatined by requesting the original authors.

5. python process_data.py [path/to/FastText_embedding]
6. python train_personality.py

To use the pre-trained model from our experiments, download the model weights: personality_model_weights.zip

and unzip inside folder: ./CASCADE/user/

generate user_personality.csv (user personality features) using this model:

7. python generate_user_personality.py

User Embeddings: Multi-view fusion

Merge the user_stylometric.csv and user_personality.csv into a single merged user_view_vectors.csv file:

8. python merge_user_views.py

Multi-view fusion of the user views (stylometric and personality) is performed using GCCA (~ CCA for two views). Generate fused user embeddings user_gcca_embeddings.npz using the following command:

9. python user_wgcca.py --input ./user_embeddings/user_view_vectors.csv --output ./user_embeddings/user_gcca_embeddings.npz --k 100 --no_of_views 2

This implementation of gcca has been adapted from https://github.com/abenton/wgcca .


Discourse Embeddings

Similar to user stylometric features, create the discourse features for each discussion forum (sub-reddit):

10. cd discourse
11. python create_per_discourse_paragraph.py

The ParagraphVector algorithm is used to generate the stylometric features. First, train the model:

12. python train_discourse.py

generate discourse.csv (user stlyometric features) using the trained model:

13. python generate_discourse.py

Running CASCADE

Hybrid CNN

Hybrid CNN combining user-embeddings and discourse-features with textual modeling.

14. cd src
15. python process_data.py [path/to/FastText_embedding]
16. python train_cascade.py

The CNN codebase has been adapted from https://github.com/dennybritz/cnn-text-classification-tf

Citation

If you use this code in your work then please cite the paper - CASCADE: Contextual Sarcasm Detection in Online Discussion Forums with the following:

@article{hazarika2018cascade,
  title={CASCADE: Contextual Sarcasm Detection in Online Discussion Forums},
  author={Hazarika, Devamanyu and Poria, Soujanya and Gorantla, Sruthi and Cambria, Erik and Zimmermann, Roger and Mihalcea, Rada},
  journal={arXiv preprint arXiv:1805.06413},
  year={2018}
}

References

[1]. Khodak, Mikhail, Nikunj Saunshi, and Kiran Vodrahalli. "A large self-annotated corpus for sarcasm." arXiv preprint arXiv:1704.05579 (2017).

[2]. Celli, Fabio, et al. "Workshop on computational personality recognition (shared task)." Proceedings of the Workshop on Computational Personality Recognition. 2013.

About

CASCADE: Contextual Sarcasm Detection in Online Discussion Forums


Languages

Language:Python 100.0%