Evidence Selection as a Token-Level Prediction Task

This repo contains the code and models for the paper (Stammbach, 2021)

Installation

Assuming Anaconda and linux, the environment can be installed with the following command:

conda create -n FEVER_bigbird python=3.6
conda activate FEVER_bigbird

conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt

Models

The models (pytorch models) can be downloaded here:

Run the models on sample data

python src/main.py --do_predict --model_name sentence-selection-bigbird-base --eval_file sample_data.jsonl --predict_filename predictions_sentence_retrieval.csv

sample_data.jsonl points to a file where each line is an example of a (claim, Wiki-page) pair

id # the claim ID
claim # the claim
page # the page title
sentences # a list -- essentally the "lines" in the official FEVER wiki-pages for a given document (where the document is split by "\n")
label_list # a list, 1 if a sentence is part of any annotated evidence set for a given claim, 0 otherwise
sentence_IDS # a list, np.arange(len(sentences))

output is a dataframe where we store for each sentence predicted by the model

claim_id
page_sentence # a tuple (Wikipage_Title, sentence_ID), for example ('2014_San_Francisco_49ers_season', 3)
y # 1 if label_list above was 1, 0 otherwise
predictions # token-level predictions for this sentence
score # np.mean(predictions), model is confident that this sentence is evidence if score > 0

re-train the models

point to train_file and eval_file, both in the format described above, and add do_train flag

python src/main.py --do_train --do_predict --model_name sentence-selection-bigbird-base --eval_file sample_data.jsonl --train_file sample_data.jsonl --predict_filename predictions_sentence_retrieval.csv

The pipeline

takes a first pass over all (claim, WikiPage) pairs where Wikipages are predicted by (Hanselowski et al., 2018) and the FEVER baseline
extracts all sentences it is confident that they are evidence in that pass, model_input is [CLS] claim [SEP] WikiPage [SEP]
retrieves conditioned evidence as explained in (Stammbach and Neumann, 2019)
retrieves hyperlinks from evidence_sentences and takes a second pass over all (claim, hyperlink) pairs where model_input is [CLS] claim, evidence_sentence [SEP] HyperlinkPage [SEP]
sorts all predicted evidence sentences for a claim in descending order
takes the five highest scoring sentences for each claim and concatenates those
predicts a label for each (claim, retrieved_evidence) pair using the RTE model (trained with an outdated huggingface sequence classification demo script)

questions

If anything should not work or is unclear, please don't hesitate to contact the authors

Dominik Stammbach (dominik.stammbach@gess.ethz.ch)

Rundong-Li / document-level-FEVER