Based on https://github.com/BrendanKennedy/contextualizing-hate-speech-models-with-explanations
SOC debiasing requires a pretrained LM. run_model.py
will automatically train one if not found in runs/lm/
.
We provided a pre-trained LM.
- Set
--grandient_accumulation_steps
if OOM. - I removed Nvidia Apex cuz it's just not working well in 2023.
I didn't bother to implement AMP using
torch.amp
as BERT-base is not that large. - Remove the loop in training scripts if you don't need multiple runs with different seeds
scripts/fou_vanilla.sh
trains a model without debiasingscripts/fou_soc.sh
trains a model with SOC debiasing
- Use
scripts/test_toxigen.sh
to get accuracy and F1 scores on Toxigen. It will also create a prediction file used for other tests below.
- Run
scripts/test_bias_founta.sh
to getruns/founta_*/founta_bias_eval.csv
. The bias metrics are from https://arxiv.org/abs/2102.00086
- Run
scripts/calc_target_group_fpr.sh
to get FPR of each target group in Toxigen. The output is inruns/founta_*/toxigen_fpr.csv
.
scripts/explain_soc.sh
will calculate the weight of every word in samples specified indata/toxigen_soc_line_numbers.csv