This repository is for the Detecting Hate Speech Contents Using Embedding Models paper. The paper has been submitted to the CSoNet 2021 conference and is under review. The public version of this paper will be available on arXiv soon.
The src
directory contains the source code in IPYNB format. The notebooks were originally created in Google Colab, you can either download and edit them locally in a Jupyter notebook or run them in Google Colab environment. We have already included instructions in the notebooks.
The data
directory contains three datasets that were used to evaluate the proposed model, including the HASOC-2019, HSOF-3 and HS2-2021 datasets. We note that:
- The HASOC-2019 dataset has 5,853 training instances and 1,154 test instances; each instance is labeled as hate speech or not.
- The HSOF-3 dataset has 24,802 instances and three labels, i.e., hate speech, offensive language, and neither.
- The HS2-2021 dataset has 23,169 instances labeled as hate speech, and the rest have 8,619 instances.
The hate speech dictionary is available in the dictionary
directory, and the current version contains 766 terms.
We report the number of training parameters in millions for each experimental setup. The first three experiments only consider word embeddings which are generated by the word2vec model. The 4, 5, 6 experiments combine word embeddings and hate speech embeddings. We also fine-tune the BERTweet model for comparison purposes. We consider three sorts of neural network models, i.e., multilayer perceptron (MLP), BiLSTM, CNN.
# | Models | HASOC-2019 | HSOF-3 | HS2-2021 |
---|---|---|---|---|
1 | WE + MLP | 3.5 | 5.2 | 7.1 |
2 | WE + CNN | 4 | 5.7 | 7.6 |
3 | WE + BiLSTM | 3.7 | 5.4 | 7.3 |
4 | [WE + HSE] + MLP | 3.5 | 5.2 | 7.1 |
5 | [WE + HSE] + CNN | 4 | 5.7 | 7.6 |
6 | [WE + HSE] + BiLSTM | 3.7 | 5.4 | 7.3 |
7 | BERTweet + Softmax | 135 | 135 | 135 |
- Phuc H. Duong, Cuong C. Chung, Loc T. Vo (AI-LAB, Faculty of Information Technology, Ton Duc Thang University, Vietnam).
- Hien T. Nguyen (Department of Economic Mathematics, Banking University of Ho Chi Minh City, Vietnam).
- Dat Ngo (NewAI Research, Vietnam).