factoid_QA_with_distant_spervision
Codes for Factoid Question Answering With Distant Supervision.
I am cleaning the codes for uploading, and some description should be added.
Requirements
- GPU and CUDA 8 are required
- python >=3.5
- pytorch 0.3.0
- pandas
- msgpack
- spacy 1.x
- cupy
- pynvrtc
- jieba
Download Data
Please download data files from google drive, and put the files under the "dat" file. Specifically, download these four files,
questions_dis_data_150htmls_using_abstext.txt
triple_weight_by_search.txt
new_mined_paraphrase0124.txt
WebQA.v1.0.tar.gz # is it proper to upload this dataset?
Then unzip the WebQA data with tar -zxvf WebQA.v1.0.tar.gz
.
Model training
Train the model via runing
cd DSRC
mkdir logs
python train_model.py
Please refer to parameters.py
for configuration details, where train_idx
is consponding to different experimental configurations in the paper.
Automatic training data generation via distant supervision
Besides the generated training data, we also released the data used to generate the training data, training sample selection and ming the distant paraphrases.
Training data generation via distant supervision
Coming soon.
Training sample selection and distant paraphrase minging
Credits
Autor of sru: Tao Lei.
Author of the Document Reader model: Danqi Chen.
Author of the original Pytorch implementation: Runqi Yang.
Most of the pytorch model code is borrowed from Facebook/ParlAI under a BSD-3 license.