castorini / pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

Home Page:http://pyserini.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`pyserini.search.lucene` recall result not good as `LuceneSearcher.search`

better629 opened this issue · comments

I have index the corpus using pyserini.index --collection JsonCollection --generator DefaultLuceneDocumentGenerator --storePositions --storeDocvectors --storeRaw.

And do a seach using both pyserini.search.lucene and LuceneSearcher.search, then check the result with the ground-truth documents.

The result is recall of pyserini.search.lucene is smaller than LuceneSearcher.search. But according to the https://github.com/castorini/pyserini/blob/master/docs/usage-index.md#building-a-bm25-index-direct-java-implementation, it seems that the result should be the same. Anyone know why?