The proliferation of hate speech on social media poses serious societal challenges, from mental health impacts to fostering discrimination and violence. Addressing this, our project aims to automate the detection of hate speech, utilizing a diverse dataset and advanced machine learning techniques. Our goal is to enhance online safety and inclusivity by accurately classifying texts as hate speech or non-hate speech.
- Introduction
- Requirements
- Related Work and Methodologies
- Dataset and Evaluation
- Methods
- Results
- Conclusion
- References
Clone the repository and install the required packages:
git clone https://github.com/Clara-z/CSCI467-final-project.git
cd CSCI467-final-project
pip install -r requirements.txt
Train and evaluate the model:
python baseline.py
python naive_bayes.py
python BERT.py
Ensuring our models' robust performance on unseen data, we implemented the following strategies:
- Cross-Validation: We utilized k-fold cross-validation, particularly in the Naive Bayes model, to enhance the model's ability to generalize across different data samples.
- Early Stopping: In the BERT model, we incorporated an early stopping mechanism during training. This approach halts the training process if the model's performance on the validation set does not improve for a predetermined number of epochs, thus preventing overfitting.
Informed by the research of Kennedy et al. (2020) and Dixon et al. (2018), our approach emphasizes the crucial role of context in hate speech detection. Utilizing the advanced capabilities of BERT, known for its deep contextual analysis, we aim to capture the subtle and complex nuances of hate speech. This methodology aligns with the current best practices in the field, advocating for balanced datasets and context-aware models to ensure effective and unbiased detection of hate speech across various online platforms.
In optimizing our BERT model, we utilized Weights & Biases (WandB) for hyperparameter tuning. WandB's capabilities in experiment tracking and hyperparameter space exploration greatly enhanced our model's performance. By systematically evaluating combinations of learning rates, optimizer types, and batch sizes, WandB helped identify the most effective settings for our model, leading to improved accuracy and efficiency in hate speech detection.
The project utilizes the "Dynamically Generated Hate Speech Dataset" from Kaggle, featuring 40,463 entries. This dataset is meticulously balanced, with a near-equal distribution of hate speech and non-hate categories.
- Training Set (70%): Used for model training.
- Development Set (10%): Used for model evaluation and tuning.
- Testing Set (20%): Provides an unbiased performance measure on unseen data.
Metrics used for evaluation include:
- Accuracy: General model reliability.
- Precision: Minimizing false positives.
- Recall: Detecting all instances of hate speech.
- F1-Score: Harmonic mean of precision and recall.
Serving as a foundational comparison point, this model predicts the training dataset's most frequent label, regardless of input text.
This probabilistic model, based on Bayes’ theorem, is enhanced with TF-IDF vectorization and optimized through cross-validation and hyperparameter tuning, focusing on linguistic patterns in text classification.
BERT, with its deep learning architecture and contextual word understanding, represents a significant advancement in hate speech detection. We fine-tune BERT-tiny (a smaller pre-trained BERT variant) with a dense layer and optimize it using a Weights and Biases (WandB) system, ensuring robust performance in complex linguistic analysis.
Our study reveals distinct performance variations among the Majority Classifier, Naive Bayes, and BERT models. BERT's superior performance highlights its advanced capabilities in contextual understanding, though it also encounters challenges with ambiguities and complex sentence structures.
Our evaluation of various models, including Naive Bayes and BERT, underscores the significance of advanced models for effective hate speech detection. We propose future work focusing on data enrichment, fine-tuning sensitivity to subtleties, and hybrid modeling approaches for more refined hate speech detection systems.
- Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Dixon, L., Li, J., Sorensen, J., Thain, N., and Vasserman, L. Measuring and mitigating unintended bias in text classification. In AAAI/ACM Conference on AI, Ethics, and Society, pp. 67–73, 2018.
- Kennedy, B., Jin, X., Davani, A. M., Dehghani, M., and Ren, X. Contextualizing hate speech classifiers with post-hoc explanation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 5435–5442, 2020.
- Sharengaraju, U. Dynamically generated hate speech dataset. Kaggle, 2021. URL https://www.kaggle.com/datasets/usharengaraju/dynamically-generated-hate-speech-dataset/data.