Reimplementation of "A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks"
The codes are based on official repo (Pytorch) and huggingface.
Original Paper : Link
Training environment : Ubuntu 18.04, python 3.6
pip3 install torch torchvision torchaudio
pip install scikit-learn
Download bert-base-uncased
checkpoint from hugginface-ckpt
Download bert-base-uncased
vocab file from hugginface-vocab
Download CLINC OOS intent detection benchmark dataset from tensorflow-dataset
The downloaded files' directory should be:
Mahalanobis-BERT
ã„´ckpt
ã„´bert-base-uncased-pytorch_model.bin
ã„´dataset
ã„´clinc_oos
ã„´train.csv
ã„´val.csv
ã„´test.csv
ã„´test_ood.csv
ã„´vocab
ã„´bert-base-uncased-vocab.txt
ã„´models
...
In their paper, the authors conducted OOD experiment for NLP using CLINC OOS intent detection benchmark dataset, the OOS dataset contains data for 150 in-domain services with 150 training sentences in each domain, and also 1500 natural out-of-domain utterances. You can download the dataset at Link.
Original dataset paper, and Github : Paper Link, Git Link
python main.py --train_or_test train --device gpu --gpu 0
python main.py --train_or_test test --device gpu --gpu 0
[1] https://arxiv.org/pdf/1807.03888.pdf
[2] https://github.com/pokaxpoka/deep_Mahalanobis_detector