Rundong-Li / document-level-FEVER

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Evidence Selection as a Token-Level Prediction Task

This repo contains the code and models for the paper (Stammbach, 2021)

Installation

Assuming Anaconda and linux, the environment can be installed with the following command:

conda create -n FEVER_bigbird python=3.6
conda activate FEVER_bigbird

conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt

Models

The models (pytorch models) can be downloaded here:

Run the models on sample data

python src/main.py --do_predict --model_name sentence-selection-bigbird-base --eval_file sample_data.jsonl --predict_filename predictions_sentence_retrieval.csv

sample_data.jsonl points to a file where each line is an example of a (claim, Wiki-page) pair

  • id # the claim ID
  • claim # the claim
  • page # the page title
  • sentences # a list -- essentally the "lines" in the official FEVER wiki-pages for a given document (where the document is split by "\n")
  • label_list # a list, 1 if a sentence is part of any annotated evidence set for a given claim, 0 otherwise
  • sentence_IDS # a list, np.arange(len(sentences))

output is a dataframe where we store for each sentence predicted by the model

  • claim_id
  • page_sentence # a tuple (Wikipage_Title, sentence_ID), for example ('2014_San_Francisco_49ers_season', 3)
  • y # 1 if label_list above was 1, 0 otherwise
  • predictions # token-level predictions for this sentence
  • score # np.mean(predictions), model is confident that this sentence is evidence if score > 0

re-train the models

point to train_file and eval_file, both in the format described above, and add do_train flag

python src/main.py --do_train --do_predict --model_name sentence-selection-bigbird-base --eval_file sample_data.jsonl --train_file sample_data.jsonl --predict_filename predictions_sentence_retrieval.csv

The pipeline

  • takes a first pass over all (claim, WikiPage) pairs where Wikipages are predicted by (Hanselowski et al., 2018) and the FEVER baseline
  • extracts all sentences it is confident that they are evidence in that pass, model_input is [CLS] claim [SEP] WikiPage [SEP]
  • retrieves conditioned evidence as explained in (Stammbach and Neumann, 2019)
  • retrieves hyperlinks from evidence_sentences and takes a second pass over all (claim, hyperlink) pairs where model_input is [CLS] claim, evidence_sentence [SEP] HyperlinkPage [SEP]
  • sorts all predicted evidence sentences for a claim in descending order
  • takes the five highest scoring sentences for each claim and concatenates those
  • predicts a label for each (claim, retrieved_evidence) pair using the RTE model (trained with an outdated huggingface sequence classification demo script)

questions

If anything should not work or is unclear, please don't hesitate to contact the authors

About


Languages

Language:Python 100.0%