castorini / duobert

Multi-stage passage ranking: monoBERT + duoBERT

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Merge relevant docs when generating Dev dataset

jingtaozhan opened this issue · comments

I'm having trouble understanding following codes from convert_msmarco_to_duobert_tfrecord.py.

        qrels = None
        if set_name != 'test':
            qrels = load_qrels(path=qrels_path)

        queries = load_queries(queries_path)
        run = load_run(path=run_path)
        data = merge(qrels=qrels, run=run, queries=queries)

When tfrecord for dev set is generated, relevant docs are added to the data together with the rank list output from the MonoBERT. I'm confused about this. So does it mean that the result for the dev set isn't comparable with the test set result?

Oh, I get it. It is to generate the label, not to be one of the candidates.