This repository contains code and models for the paper: Disentangling Document Topic and Author Gender in Multiple Languages: Lessons for Adversarial Debiasing.
@inproceedings{dayanik21:_disen_docum_topic_author_gender_multip_languag,
author = {Dayanik, Erenay and Padó, Sebastian},
biburl = {https://puma.ub.uni-stuttgart.de/bibtex/29f3e2e70efa78c0dd97ae2f4b2f071ac/sp},
booktitle = {Proceedings of the EACL WASSA workshop},
note = {To appear},
title = {Disentangling Document Topic and Author Gender in Multiple Languages: Lessons for Adversarial Debiasing},
year = 2021
}
$ git clone https://github.com/wassa21/adv.git
$ cd adv
$ pip install -r requirements.txt
Please see ./data
for information about the dataset
$ cd src/topic_classification
$ bash run.sh
$ cd src/gender_classification
$ bash run.sh
$ cd src/topic_classification_with_adv_gender
$ bash run.sh
$ cd src/gender_classification_with_adv_topic
$ bash run.sh
-
Each
run.sh
script above will save the model with best weighted F-Score tolessons_for_adversarial_debiasing/models
and save predictions on test set tolessons_for_adversarial_debiasing/outputs
. -
By default, prediction file names generated by the following template:
{LANG}_{ix}_BERT_SUM_MLP_{DATE}_best_model_outputs.csv
LANG
: 'de','es','fr' or 'tr';ix
: 0,1,2,3,4 representing one of the five randomly generated test sets.DATE
: the system date and time when the scripts was runned. -
In order to obtain weighted F-Score evaluation on these generation files one can use the
src/evaluate.py
. It expects 3 arguments:argv[1]:
path of prediction file
(Example: de_1_BERT_SUM_MLP_2020-05-25_21-45-03_best_model_outputs.csv)argv[2]:
task type
(eithergender
ortopic
)argv[3]:
is_adv
(eithertrue
orfalse
)For example, to evaluate the predictions of a gender classifier (Table 4) one can use the following command:
$ python evaluate.py GenderPredictor_BERT_SUM_MLP_2020-05-25_21-45-03 gender false
This command will evaluate gender classifiers trained on DE,ES,FR,TR transcripts of TED talks.
- Use
src/evaluate_mb.py
to evaluate majority baseline. (With same command line arguments)