DOV-Vlaanderen / pydov

Python package to retrieve data from Databank Ondergrond Vlaanderen (DOV)

Home Page:https://pydov.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Make 10000 queries limit optional / user defined.

yairlevy opened this issue · comments

  • PyDOV version: 2.0.0
  • Python version: 3.8.3
  • Operating System: Windows 10

Description

The current version of pydov.search.abstract limits queries to 10000 inputs. This is useful to avoid time constraints. Although many analyses require more than that.

Solution

Couldn't the error:

"
raise FeatureOverflowError(
'Reached the limit of {:d} returned features. Please split up '
'the query to ensure getting all results.'.format(10000))
"

rather be replaced by a call for a user defined input, asking whether the user would like to go on with the expected lengthy query anyway?

Thank you.

Thx for the issue. I refer to the DOV maintainers, believing that this 10k limit has to do with the geoserver?

Yes, this is not a limit we impose in pydov (nor can change it here). The sole reason the FeatureOverflowError is there is to warn users that their results will be (most certainly) incomplete when we receive 10000 features (because we will never receive more, even if there are).

That being said, I do think a limit like this makes sense (serverside). An XML response with 10000 features is already quite large, making this substantially larger will impact database and Geoserver load. We'd also risk hitting the timeouts of the reverse proxy, leaving the user with no data at all.. while still impacting the backend servers.

There's two ways of avoiding this limit:

  • Split up your query into multiple queries so that each contains less than 10000 features, and concatenate the resulting dataframes. You can use multiple strategies for this, depending on your use case. E.g. you could make queries on a string attribute, starting with letter A and going to Z, or use a geographical filter and make a sliding window or box over your region of interest. This is the option you can use today.
  • Switch to WFS2, where there is support for paging. This would allow to retrieve a large resultset in multiple pages/requests. This is something that would need to be changed in pydov, so this is currently not possible. See #194

I will close this ticket since this is fixed by switching pydov to WFS2 and paged WFS requests.