Support for querying more than 50000 results
kgpavinash opened this issue · comments
If I try to retrieve more than 50,000 records I get the following error:
WARNING:root:Requests made without an app_token will be subject to strict throttling limits.
Traceback (most recent call last):
File "somesoda.py", line 23, in <module>
result = client.get(medic_identifier, query=finalQuery)
File "C:\Users\aprabhakar\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sodapy\__init__.py", line 291, in get
params=params)
File "C:\Users\aprabhakar\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sodapy\__init__.py", line 406, in _perform_request
_raise_for_status(response)
File "C:\Users\aprabhakar\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sodapy\__init__.py", line 460, in _raise_for_status
raise requests.exceptions.HTTPError(http_error_msg, response=response)
requests.exceptions.HTTPError: 400 Client Error: Bad Request.
length must be <= 50000
The snippet of code I am using is:
finalQuery = 'SELECT * WHERE year = ' + maxyear +' AND quarter = '+maxQuarter+' ORDER BY ndc DESC LIMIT 50001'
finalQuery2 = 'SELECT COUNT(*) WHERE year = ' + maxyear +' AND quarter = '+maxQuarter
result = client.get(medic_identifier, query=finalQuery)
The dataset I am using has 600,000+ results. Is there a way to get all of them?
That error is returned from the server, but the warning up top might give you a hint as to why the server is returning this response:
WARNING:root:Requests made without an app_token will be subject to strict throttling limits.
You might be hitting this limit. Either make the requests with an app_token or explicitly return a smaller number of results and add an offset to get the whole dataset.
Ah I see, I think the offset method you suggested would work perfectly for this situation. Thank you again xmunoz.