facebookresearch / DrQA

Reading Wikipedia to Answer Open-Domain Questions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Questions about files generation

donno2048 opened this issue · comments

Not really an issue, I'm simply wondering where was Npz file originated and how was it generated...

Are you referring to data/datasets/wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz?
If so, that file is generated via this script: python scripts/retriever/build_tfidf.py /path/to/doc/db /path/to/output/dir

There's more info on the tf-idf retriever here: https://github.com/facebookresearch/DrQA/tree/master/scripts/retriever#building-the-tf-idf-n-grams

Thanks!

I'm reopening this issue to ask the same question for the multitask model and the single model, which have been described as:

Model trained only on SQuAD, evaluated in the SQuAD setting

and

Model trained with distant supervision without NER/POS/lemma features, evaluated on multiple datasets (test sets, dev set for SQuAD) in the full Wikipedia setting

In the README, I couldn't figure out how the models have been generated

The models are created with this script python scripts/reader/train.py

That script accepts different types of parameters, you can read more in this README: https://github.com/facebookresearch/DrQA/tree/master/scripts/reader

Thank you, sorry for the trouble...