DOV-Vlaanderen / pydov

Python package to retrieve data from Databank Ondergrond Vlaanderen (DOV)

Home Page:https://pydov.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

remote_wfs_gml_query tutorial notebook bug

yairlevy opened this issue · comments

  • PyDOV version: 2.0.0
  • Python version: 3.8.3
  • Operating System: Windows 10 64 bit

Description

The wfs tutorial notebook doesn't find records for the hhz with value: 'Formatie van Brasschaat (+Merksplas)' and reports a bug

hhz_poly becomes:
b'<wfs:FeatureCollection xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:wfs="http://www.opengis.net/wfs" xmlns:gw_varia="http://dov.vlaanderen.be/grondwater/gw_varia" xmlns:gml="http://www.opengis.net/gml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" numberOfFeatures="0" timeStamp="2021-02-19T08:50:25.712Z" xsi:schemaLocation="http://www.opengis.net/wfs https://www.dov.vlaanderen.be/geoserver/schemas/wfs/1.1.0/wfs.xsd http://dov.vlaanderen.be/grondwater/gw_varia https://www.dov.vlaanderen.be/geoserver/wfs?service=WFS&amp;version=1.1.0&amp;request=DescribeFeatureType&amp;typeName=gw_varia%3Ahhz"/>'

filter_search gives:
ValueError: Failed to extract geometries from GML file.

The error occurs also with hhz value: '82 - Krijtafzettingen'
But it does work with other hhz values such as 'Paleoceen' !

What I Did

Trying to access hhz information

Paste the command(s) you ran and the output.
If there was a crash, please include the traceback here.

filter_search = GrondwaterFilterSearch()
df = filter_search.search(
max_features = 100,
location = GmlFilter(hhz_poly, Within),
return_fields = ('pkey_filter', 'x', 'y')
)


ValueError Traceback (most recent call last)
in
2 df = filter_search.search(
3 max_features = 100,
----> 4 location = GmlFilter(hhz_poly, Within),
5 return_fields = ('pkey_filter', 'x', 'y')
6 )

~.conda\envs\grondwater_dev\lib\site-packages\pydov\util\location.py in init(self, gml, location_filter, location_filter_kwargs, combinator)
446 location_filter_kwargs = {}
447
--> 448 self._parse_gml()
449
450 if len(self.subelements) == 1:

~.conda\envs\grondwater_dev\lib\site-packages\pydov\util\location.py in _parse_gml(self)
509 finally:
510 if gml_tree is not None:
--> 511 self._parse_gml_tree(gml_tree)
512 else:
513 raise ValueError('Failed to parse GML file.')

~.conda\envs\grondwater_dev\lib\site-packages\pydov\util\location.py in _parse_gml_tree(self, gml_tree)
554
555 if len(self.subelements) == 0:
--> 556 raise ValueError('Failed to extract geometries from GML file.')
557
558 def set_geometry_column(self, geometry_column):

ValueError: Failed to extract geometries from GML file.

The strange thing is that the same notebook works fine on my machine..

Can you try running the following minimum example?

from owslib.etree import etree
from owslib.fes import PropertyIsEqualTo
from owslib.wfs import WebFeatureService

from pydov.search.grondwaterfilter import GrondwaterFilterSearch
from pydov.util.location import GmlFilter, Within
from pydov.util.owsutil import get_namespace

hhz = WebFeatureService(
    'https://www.dov.vlaanderen.be/geoserver/wfs',
    version='1.1.0'
)

namespace = get_namespace(hhz, 'gw_varia:hhz')

naam_filter = PropertyIsEqualTo(
    propertyname='hhz_naam',
    literal='Formatie van Brasschaat (+Merksplas)')

hhz_poly = hhz.getfeature(
    typename='gw_varia:hhz',
    filter=etree.tostring(naam_filter.toXML()).decode("utf8")).read()

filter_search = GrondwaterFilterSearch()
df = filter_search.search(
    max_features=100,
    location=GmlFilter(hhz_poly, Within),
    return_fields=('pkey_filter', 'x', 'y')
)

print(df.head())

Thanks Roel.

Halas, it delivered exactly the same error.

Meanwhile, I've looked closer at the data requested by the Excel. The dropdown only gathers hhz values preceeded by a code and a symbol. On the other hand, the tree object from which the tutorial notebook gathers hhz values doesn't have these code and symbol.

The tutorial notebook works for all values besides Brasschaat. My own notebook didn't work with any value. So the user defined Excel document may need some additional interpretation.

I could solve the issue within my script for other values then Brasschaat as follows:

If we integrate the "tree" object inside new code to convert user defined queries into strict hhz names for querying, the data gets well retrieved. Namely:

Matching hhz

tree = etree.fromstring(to_bytes(hhz.getfeature('gw_varia:hhz', propertyname = 'hhz_naam').read()))
hhzwaarden = set((i.text for i in tree.findall('.//{%s}hhz_naam' % namespace)))
matchinghhz = [s for s in hhzwaarden if s in HHZ]

if locatie == 'HHZ':
naam_filter = PropertyIsEqualTo(propertyname = 'hhz_naam', literal = matchinghhz[0])
hhz_poly = hhz.getfeature(typename='gw_varia:hhz', filter =
etree.tostring(naam_filter.toXML()).decode("utf8")).read()

Bear in mind that this solution may also be ambiguous since some shortened names may theoretically be found in different complete hhz values.

Thus something is not generic somewhere in the original encoding I presume. Maybe UTF-8 or other alternative?

What do you think?

This is very bizarre. I don't see a difference between the locally defined value and the WFS value..

Can you paste the output of the following script?

import difflib
from pprint import pprint

from owslib.etree import etree
from owslib.wfs import WebFeatureService

from pydov.util.owsutil import get_namespace


def to_bytes(data):
    if isinstance(data, bytes):
        return data
    elif isinstance(data, str):
        return data.encode('utf8')


hhz = WebFeatureService(
    'https://www.dov.vlaanderen.be/geoserver/wfs',
    version='1.1.0'
)

namespace = get_namespace(hhz, 'gw_varia:hhz')

tree = etree.fromstring(
    to_bytes(
        hhz.getfeature(
            'gw_varia:hhz',
            propertyname='hhz_naam').read()))
values = set((i.text for i in tree.findall('.//{%s}hhz_naam' % namespace)))

brasschaat_value = [v for v in values if 'Brasschaat' in v][0]
brasschaat_str = 'Formatie van Brasschaat (+Merksplas)'
pprint(list(difflib.ndiff(brasschaat_value, brasschaat_str)))

Here's the result:

[' F',
' o',
' r',
' m',
' a',
' t',
' i',
' e',
' ',
' v',
' a',
' n',
' ',
' B',
' r',
' a',
' s',
' s',
' c',
' h',
' a',
' a',
' t',
' ',
' (',
' +',
' M',
' e',
' r',
' k',
' s',
' p',
' l',
' a',
' s',
' )']

Thanks for your feedback.

Bizarre. There is no difference between the two values..

Are you sure my script posted earlier does not work? I'm puzzled..

Thank you for your efforts. Pitty, I'm sure it doesn't work. This is the error traceback from a run on a reset notebook Kernel :


ValueError Traceback (most recent call last)
in
25 df = filter_search.search(
26 max_features=100,
---> 27 location=GmlFilter(hhz_poly, Within),
28 return_fields=('pkey_filter', 'x', 'y')
29 )

~.conda\envs\grondwater_dev\lib\site-packages\pydov\util\location.py in init(self, gml, location_filter, location_filter_kwargs, combinator)
446 location_filter_kwargs = {}
447
--> 448 self._parse_gml()
449
450 if len(self.subelements) == 1:

~.conda\envs\grondwater_dev\lib\site-packages\pydov\util\location.py in _parse_gml(self)
509 finally:
510 if gml_tree is not None:
--> 511 self._parse_gml_tree(gml_tree)
512 else:
513 raise ValueError('Failed to parse GML file.')

~.conda\envs\grondwater_dev\lib\site-packages\pydov\util\location.py in _parse_gml_tree(self, gml_tree)
554
555 if len(self.subelements) == 0:
--> 556 raise ValueError('Failed to extract geometries from GML file.')
557
558 def set_geometry_column(self, geometry_column):

ValueError: Failed to extract geometries from GML file.

I'm going to close this issue, as I'm unable to reproduce it on my machine.

Don't hesitate to open a new issue when you encounter other problems or questions.