transformers bert roberta albert robustness out-of-distribution out-of-domain domain-generalization out-of-distribution-detection

Pretrained Transformers Improve Out-of-Distribution Robustness

How does pretraining affect out-of-distribution robustness? We create an OOD benchmark and use it to show that pretrained transformers such as BERT have substantially higher OOD accuracy and OOD detection rates compared to traditional NLP models.

This repository most of the code for the paper Pretrained Transformers Improve Out-of-Distribution Robustness, ACL 2020.

Requires Python 3+ and PyTorch 1.0+.

To correctly use RoBERTa model, see allennlp_glue_patch/notes.py.

Citation

If you find this useful in your research, please consider citing:

@inproceedings{hendrycks2020pretrained,
    Author = {Dan Hendrycks and Xiaoyuan Liu and Eric Wallace and Adam Dziedzic Rishabh Krishnan and Dawn Song},
    Booktitle = {Association for Computational Linguistics},                            
    Year = {2020},
    Title = {Pretrained Transformers Improve Out-of-Distribution Robustness}}

About

OOD Generalization and Detection (ACL 2020)

https://arxiv.org/abs/2004.06100

transformers bert roberta albert robustness out-of-distribution out-of-domain domain-generalization out-of-distribution-detection

Languages

Language:Python 68.1%Language:Jsonnet 20.1%Language:Jupyter Notebook 7.5%Language:Shell 4.3%