castorini / pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

Home Page:http://pyserini.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add convenience method to get raw text from dense retrieval for prebuilt indexes

lintool opened this issue · comments

This issue has come up more than once, the most recent being #1548

Our dense indexes don't store the raw text, but if it's a prebuilt index, we know the corresponding sparse index that has the text. It should be possible to implement a raw method that loads the corresponding sparse index to fetch the document.