MedNLI Is Not Immune: Natural Language Inference Artifacts in the Clinical Domain
This repository contains the source code required to reproduce the analysis presented in the paper "MedNLI Is Not Immune: Natural Language Inference Artifacts in the Clinical Domain", appearing at ACL-IJCNLP 2021.
Data:
MedNLI can be downloaded from PhysioNet, though credentialed access is required. After you have downloaded the data, put the resulting directory underneath the project root directory. Organization is as follows:
.
├── mednli
│ └── 1.0.0
│ ├── LICENSE.txt
│ ├── README.txt
│ ├── SHA256SUMS.txt
│ ├── index.html
│ ├── mli_dev_v1.jsonl
│ ├── mli_test_v1.jsonl
│ └── mli_train_v1.jsonl
Set-up:
Conda environment:
conda env create -f environment.yml
conda activate clinical_nli
scispaCy language model:
General usage is: pip install <Model URL>
; en_core_sci_sm
and en_core_sci_lg
are both used in this pipeline:
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_core_sci_sm-0.4.0.tar.gz
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.4.0/en_core_sci_lg-0.4.0.tar.gz
fastText MIMIC-III embeddings:
Referenced in the original MedNLI paper by Romanov and Shivade (2018); available on the associated repo or via:
wget https://mednli.blob.core.windows.net/shared/word_embeddings/wiki_en_mimic.fastText.no_clean.300d.pickled
Configuration file:
./example_cfg.ini
: Defines paths and task-specific hyper-parameters.
Shell and python scripts:
From the project root directory:
cd ./scripts && sh parse_embeds_aflite.sh
Note: parse_embeds_aflite.sh
has 4 boolean flags:
fastText
: parse MedNLI input files (JSON) and create fastText-formatted.txt
filesftAllSubsets
: create a single fastText-formatted.txt
file containing instances from all splits (eg, train, dev test). Useful for AFLite.embeddings
: recovers embeddings for each instance in the corpus (language model is configurable)aflite
: runs adversarial filtering algorithmAfLite
adapted from Sakaguchi et al. (2019); yieldseasy
anddifficult
partitions
sh parse_embeds_aflite.sh
with all flags set to True
, run:
To replicate reported results, after running sh ft_baseline.sh
: computes fastText baseline results; ifevalAflite
flag is set toTrue
, also computes fastText results for AfLite easy and difficult partitions.sh lexical.sh
: computes ngram counts, PMI, and mean/median hypothesis length by label.sh semantic.sh
: usesscispaCy
to link named ents to UMLS; conducts statistical hypothesis testing re: heuristics.
From the project root directory, cd ./src/utils
and:
python get_hyp_len.py
: Computes hypothesis length for two versions of the corpus (multi-word entities merged and separate).python get_partition_ids.py
: Creates 2 arrays with instance ids for the easy and difficult AfLite partitions.- instance ids will have the format
<split><numeric_id>
- underlying text can be recovered by joining against the
./mednli/fastText/mli_all_w_premise_v1_sep.txt
file.
- instance ids will have the format
If you find this code useful in your research, please consider citing:
@inproceedings{herlihy-rudinger-2021-mednli,
title = "{M}ed{NLI} Is Not Immune: {N}atural Language Inference Artifacts in the Clinical Domain",
author = "Herlihy, Christine and
Rudinger, Rachel",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-short.129",
doi = "10.18653/v1/2021.acl-short.129",
pages = "1020--1027",
}