Questions about files generation

Question

Questions about files generation

donno2048 opened this issue 4 years ago · comments

Elisha Hollander commented 4 years ago

Not really an issue, I'm simply wondering where was Npz file originated and how was it generated...

Elisha Hollander commented 4 years ago

Thanks!

Newvick · Answer 1 · Fri Nov 13 2020 06:52:34 GMT+0800 (China Standard Time)

Are you referring to data/datasets/wikipedia/docs-tfidf-ngram=2-hash=16777216-tokenizer=simple.npz?
If so, that file is generated via this script: python scripts/retriever/build_tfidf.py /path/to/doc/db /path/to/output/dir

There's more info on the tf-idf retriever here: https://github.com/facebookresearch/DrQA/tree/master/scripts/retriever#building-the-tf-idf-n-grams

Elisha Hollander · Answer 2 · Sun Nov 15 2020 00:52:51 GMT+0800 (China Standard Time)

I'm reopening this issue to ask the same question for the multitask model and the single model, which have been described as:

Model trained only on SQuAD, evaluated in the SQuAD setting

and

Model trained with distant supervision without NER/POS/lemma features, evaluated on multiple datasets (test sets, dev set for SQuAD) in the full Wikipedia setting

In the README, I couldn't figure out how the models have been generated

Newvick · Answer 3 · Sun Nov 15 2020 01:21:54 GMT+0800 (China Standard Time)

The models are created with this script python scripts/reader/train.py

That script accepts different types of parameters, you can read more in this README: https://github.com/facebookresearch/DrQA/tree/master/scripts/reader

Elisha Hollander · Answer 4 · Sun Nov 15 2020 01:39:57 GMT+0800 (China Standard Time)

Thank you, sorry for the trouble...