xmunoz / sodapy

Python client for the Socrata Open Data API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support for querying more than 50000 results

kgpavinash opened this issue · comments

If I try to retrieve more than 50,000 records I get the following error:

WARNING:root:Requests made without an app_token will be subject to strict throttling limits.
Traceback (most recent call last):
  File "somesoda.py", line 23, in <module>
    result = client.get(medic_identifier, query=finalQuery)
  File "C:\Users\aprabhakar\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sodapy\__init__.py", line 291, in get
    params=params)
  File "C:\Users\aprabhakar\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sodapy\__init__.py", line 406, in _perform_request
    _raise_for_status(response)
  File "C:\Users\aprabhakar\AppData\Local\Programs\Python\Python37-32\lib\site-packages\sodapy\__init__.py", line 460, in _raise_for_status
    raise requests.exceptions.HTTPError(http_error_msg, response=response)
requests.exceptions.HTTPError: 400 Client Error: Bad Request.
        length must be <= 50000

The snippet of code I am using is:


finalQuery = 'SELECT * WHERE year = ' + maxyear +' AND quarter = '+maxQuarter+' ORDER BY ndc DESC LIMIT 50001'
finalQuery2 = 'SELECT COUNT(*) WHERE year = ' + maxyear +' AND quarter = '+maxQuarter
result = client.get(medic_identifier, query=finalQuery)

The dataset I am using has 600,000+ results. Is there a way to get all of them?

That error is returned from the server, but the warning up top might give you a hint as to why the server is returning this response:

WARNING:root:Requests made without an app_token will be subject to strict throttling limits.

You might be hitting this limit. Either make the requests with an app_token or explicitly return a smaller number of results and add an offset to get the whole dataset.

Ah I see, I think the offset method you suggested would work perfectly for this situation. Thank you again xmunoz.