🔥 Mahalanobis-BERT 🔥

Reimplementation of "A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks"

The codes are based on official repo (Pytorch) and huggingface.

Original Paper : Link

Installation ☕

Training environment : Ubuntu 18.04, python 3.6

pip3 install torch torchvision torchaudio
pip install scikit-learn

Download bert-base-uncased checkpoint from hugginface-ckpt
Download bert-base-uncased vocab file from hugginface-vocab
Download CLINC OOS intent detection benchmark dataset from tensorflow-dataset

The downloaded files' directory should be:

Mahalanobis-BERT
ㄴckpt
  ㄴbert-base-uncased-pytorch_model.bin
ㄴdataset
  ㄴclinc_oos
    ㄴtrain.csv
    ㄴval.csv
    ㄴtest.csv
    ㄴtest_ood.csv
  ㄴvocab
    ㄴbert-base-uncased-vocab.txt
ㄴmodels
...

Dataset Info 📖

In their paper, the authors conducted OOD experiment for NLP using CLINC OOS intent detection benchmark dataset, the OOS dataset contains data for 150 in-domain services with 150 training sentences in each domain, and also 1500 natural out-of-domain utterances. You can download the dataset at Link.

Original dataset paper, and Github : Paper Link, Git Link

Run 🌟

Train

python main.py --train_or_test train --device gpu --gpu 0

Test

python main.py --train_or_test test --device gpu --gpu 0

References

[1] https://arxiv.org/pdf/1807.03888.pdf
[2] https://github.com/pokaxpoka/deep_Mahalanobis_detector

About

Reimplementation of "A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks"

Languages

Language:Python 100.0%