castorini / pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

Home Page:http://pyserini.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to build collections using msmarco and beir

BeastyZ opened this issue · comments

Hi, I see that you provide faiss-based preindex, can you provide collections for building preindex? If it is not convenient, can you tell us what fields are used for each dataset to build the collections?
I want to use these collections to build an index on the new retriever.

Hi, we get the collections from the beir github repo https://github.com/beir-cellar/beir
title are prepended to the text body when it is available.

I didn't realize that you were also a major contributor to this repository, thanks again for your help.