incomplete fetch of checklist from `/checklist` endpoint
ayushanand18 opened this issue · comments
Overview
When we try to generate an OBIS checklist using pyobis.checklist
-> list()
function, then it returns a checklist of size at max 10. This behavior is due to the fact that OBIS API returns by default returns a list of 10 records only in a query.
To fetch subsequent records, we need to pass a skip
parameter to skip the number of records already fetched.
For example, let us look at this request
https://api.obis.org/v3/checklist?size=10&skip=10&taxonid=1363
It fetches subsequent 10 records after first 10 have been fetched.
To reciprocate
Run
from pyobis.checklist import ChecklistQuery
ChecklistQuery().list(taxonid=1363)["total"] # total records
len(ChecklistQuery().list(taxonid=1363)["results"]) # total fetched
Note: This is not something mentioned in the documentation, and I got this insight thanks to OBIS Mapper
.
We need to include a pagination process similar to occurrences.search
here also. I'm writing a patch for this.
An interesting finding, although I couldn't understand it. I queried checklist for taxonid 1363
. The total was showing 2140
yet when I run this query I get zero results.
https://api.obis.org/v3/checklist?taxonid=1363&skip=2129&size=5000
output
{"total":2140,"results":[]}
Something weird and outside my understanding. Please help.
I wonder if this is because Dorylaimina is an Order. The total could be a count of species within the order.
@pieterprovoost : can you shed some light on this?
@7yl4r Elasticsearch approximates the cardinality for better performance, so it's best to paginate until the result set is empty. In this case there are 2129 taxa. See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html#_precision_control.
Noted, Thanks @pieterprovoost for the info!