castorini / pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

Home Page:http://pyserini.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Upgrading to Pyserini 0.24 means `.raw` option not available.

bevankoopman opened this issue · comments

Upgrading to pyserini 0.24 causes the following error:

File "/Users/.../service.py", line 44, in search
    hybrid_hits, sparse_raws = self.hybrid(dense_hits, sparse_hits, 0.7, k)
File "/Users/../service.py", line 85, in hybrid
    sparse_raws[hit.docid] = hit.raw
AttributeError: 'io.anserini.search.ScoredDoc' object has no attribute 'raw'

This works fine with Pyserini 0.22.

Seems the API has changed so raw attribute no longer exists? What should we use instead? (I note too that a lot of tutorial / code example seem to reference the use of this attribute.)

See discussion here: #1758 for alternatives

I made this API change because, previously, the Result object would load the raw doc content eagerly. Instead it does so lazily.

Change to something like this?

hits[0].lucene_document.get('raw')

Closing... but please reopen if you have any other issues?