kimjeyoung / Mahalanobis-BERT

Reimplementation of "A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

🔥 Mahalanobis-BERT 🔥

Reimplementation of "A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks"

The codes are based on official repo (Pytorch) and huggingface.

Original Paper : Link

Installation ☕

Training environment : Ubuntu 18.04, python 3.6

pip3 install torch torchvision torchaudio
pip install scikit-learn

Download bert-base-uncased checkpoint from hugginface-ckpt
Download bert-base-uncased vocab file from hugginface-vocab
Download CLINC OOS intent detection benchmark dataset from tensorflow-dataset

The downloaded files' directory should be:

Mahalanobis-BERT
ã„´ckpt
  ã„´bert-base-uncased-pytorch_model.bin
ã„´dataset
  ã„´clinc_oos
    ã„´train.csv
    ã„´val.csv
    ã„´test.csv
    ã„´test_ood.csv
  ã„´vocab
    ã„´bert-base-uncased-vocab.txt
ã„´models
...

Dataset Info 📖

In their paper, the authors conducted OOD experiment for NLP using CLINC OOS intent detection benchmark dataset, the OOS dataset contains data for 150 in-domain services with 150 training sentences in each domain, and also 1500 natural out-of-domain utterances. You can download the dataset at Link.

Original dataset paper, and Github : Paper Link, Git Link

Run 🌟

Train

python main.py --train_or_test train --device gpu --gpu 0

Test

python main.py --train_or_test test --device gpu --gpu 0

References

[1] https://arxiv.org/pdf/1807.03888.pdf
[2] https://github.com/pokaxpoka/deep_Mahalanobis_detector

About

Reimplementation of "A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks"


Languages

Language:Python 100.0%