castorini / pyserini

Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

Home Page:http://pyserini.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Results pagination support in pyserini

bevankoopman opened this issue · comments

How does one do results pagination in pyersini? (There is support for this in Elastic which I thought it might inherit from Lucene itself but I didn't find anything online re pyserini.)

To be clear, by pagination I mean the following situation:

  • Issue a query with the backend determining the ranking of document ids; but document content is only populated for top k results (page 1 being 0...k);
  • First page of results returned to client;
  • Client can then issue the same query asking for page 2 with backend returning documents with contents for k...2k (page 2)
  • And so on...

The main purpose being to avoid the latency of having to lost documents contents for long result list, especially if the user never looks beyond page 1 of results.

Is there support for this or do we have to build it ourselves?

Thanks!

Any update / thoughts in how to do this?

Hi @bevankoopman - unfortunately, this isn't a feature that's supported in Pyserini right now... so, yea, you'll have to roll it yourself...

How deep into the ranked list are you planning to go? I haven't looked into the implementation in Elasticsearch, but in terms of methods associated with IndexSearcher [1] I don't see any obvious paging support... so I wonder how Elasticsearch implements it...

[1] https://lucene.apache.org/core/9_10_0/core/org/apache/lucene/search/IndexSearcher.html