Named Entity Biases in Pre-trained Language Models

This repository contains the experiments used in the paper:

"You are grounded!": Latent Name Artifacts in Pre-trained Language Models.

Vered Shwartz, Rachel Rudinger, and Oyvind Tafjord. EMNLP 2020.

1. Last Name Prediction:

Run the script predict_last_name.sh. It will produce the file data/all_names_results.tsv.

2. Given Name Recovery:

Run the script predict_given_name.sh [device] with a GPU number or "cpu". It will save the results in a json file for each language model under results.

3. Sentiment Analysis:

Using the names generated for the previous step, run the script sentiment_analysis.py --text_dir [text_dir] --device [device]. It will produce the LaTex table with the results.

4. Effect on Downstream Tasks:

The downstream directory contains the templates, sampled names to assign to the templates, and a notebook to run the experimets for Winogrande and SQuAD.

References

Please cite this repository using the following reference:

@inproceedings{you_are_grounded_2020,
  title={``You are grounded!'': Latent Name Artifacts in Pre-trained Language Models},
  author={Vered Shwartz and Rachel Rudinger and Oyvind Tafjord},
  booktitle={EMNLP},
  year={2020}
}

About

Named Entity Biases in Pre-trained Language Models

Other

Languages

Language:Jupyter Notebook 72.1%Language:Python 26.8%Language:Shell 1.1%