Merge relevant docs when generating Dev dataset

Question

Merge relevant docs when generating Dev dataset

jingtaozhan opened this issue 5 years ago · comments

I'm having trouble understanding following codes from convert_msmarco_to_duobert_tfrecord.py.

        qrels = None
        if set_name != 'test':
            qrels = load_qrels(path=qrels_path)

        queries = load_queries(queries_path)
        run = load_run(path=run_path)
        data = merge(qrels=qrels, run=run, queries=queries)

When tfrecord for dev set is generated, relevant docs are added to the data together with the rank list output from the MonoBERT. I'm confused about this. So does it mean that the result for the dev set isn't comparable with the test set result?

Jingtao Zhan · Answer 1 · Mon Feb 10 2020 16:54:09 GMT+0800 (China Standard Time)

Oh, I get it. It is to generate the label, not to be one of the candidates.