Lexicools at SemEval-2023 Task 10: Sexism Lexicon Construction via XAI

This repo is our work on the SemEval-2023 Task 10 Explainable Detection of Online Sexism (EDOS)

The main idea is to develop lexicon-based models for sexism classifier as we prioritize explanability over performance.

Our approach consists of three main steps:

Lexicon Construction: based on Pointwise Mutual Information (PMI) and Shapley value
Augmentation: by utilising an unannotated corpus, BERTweet and GPT-J
Incorporating Lexicons: for bag-of-word logistic regression and fine-tuning LLMs.

Our results demonstrate that our Shapley approach effectively produces a high-quality lexicon. We also show that simply counting the presence of certain words in our lexicons and comparing the count can outperform a BoW logistic regression in task B/C and fine-tuning BERT in task C. In the end, our classifier achieved F1-scores of 53.34% and 27.31% on the official blind test sets for tasks B and C, respectively.

Project Structure

Please follow these steps to re-run our experiments.

Find-tune a model in Task A

TaskA Fine-tuning BERT.ipynb
TaskA Fine-tuning BERTweet.ipynb
TaskA Fine-tuning TwHINBERT.ipynb

Contruct the lexicons:

TaskB Lexicon Construciton [PMI].ipynb
TaskB Lexicon Construciton [Shapley].ipynb

Generate more data with GPT-J:

Generate data w: GPT-J.ipynb

Augment the lexicons:

TaskB Augmentation w_ XAI.ipynb
TaskB Augmentation with BERTTweets.ipynb
TaskB Augmentation with GPT-J.ipynb

Incorperate the lexicons:

TaskB Incorporate BERT.ipynb
TaskB Logistic Regression.ipynb

See our resulting lexicons in Lexicons

Authors

Citation

If you use our lexicon or accompanying materials, please cite:

@inproceedings{nakwijit-samir-semeval-2023,
    title = "Lexicools at SemEval-2023 Task 10: Sexism Lexicon Construction via XAI",
    author = "Nakwijit, Pakawat and
              Samir, Mahmoud and
              Purver, Matthew",
    booktitle = "Proceedings of the 16th International Workshop on Semantic Evaluation (SemEval-2022)",
    month = july,
    year = "2022",
    address = "Seattle",
    publisher = "Association for Computational Linguistics"
}

More information about the task and data

Plese visit Explainable Detection of Online Sexism (EDOS)

aascode / SemEval2023-Task10

Lexicools at SemEval-2023 Task 10: Sexism Lexicon Construction via XAI

Project Structure

Authors

Citation

More information about the task and data

About

Languages