qgis / QGIS-Enhancement-Proposals

QEP's (QGIS Enhancement Proposals) are used in the process of creating and discussing new enhancements for QGIS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use of FileGeodatabase spatial index in OpenFileGDB driver (QGIS Grant 2020 program)

rouault opened this issue · comments

Use of FileGeodatabase spatial index in OpenFileGDB driver

Date 2020/05/12

Author Even Rouault (@rouault)

Contact even dot rouault at spatialys dot com

maintainer @rouault

Version GDAL 3.2

Summary

Outside of the open-source geospatial realm, ESRI software is the dominant vendor in the geospatial industry. Regardless of how open-source friendly a particular organisation or user is, the reality
is that they will need to interact with ESRI data formats on a regular basis. Many official government data portals provide spatial data only in ESRI formats, and some customers will only supply source data in these formats. It is critical to the success of open-source geospatial software that this software has stable and performant capabilities to read these proprietary ESRI formats and provide a means to convert this data into standard, open-source friendly formats.

Since all of QGIS' support for reading and writing disk based files is provided by the underlying GDAL library, it is natural that support for these proprietary ESRI formats be added or extended in GDAL itself. While QGIS will directly benefit from this work, investment in GDAL also directly benefits many other open-source geospatial tools, including GRASS GIS, PostGIS, R, rasterio / fiona, MapServer, etc.

While an open-source driver (OpenFileGDB) exists for reading vector datasets stored within ESRI geodatabase (gdb) files, this driver has been developed from reverse engineering efforts. One major component missing from the driver is the lack of support for spatial indexes, which results in very slow vector layer access from these files. Unfortunately, storing vectors in GDB files is the standard practice for ESRI software, and accordingly many users are required to access vector data stored in
these formats. This is especially critical for users in administrations where the official government data portals are based on ESRI server software, providing these users with no choice but to
obtain official datasets in these formats. An optimised open-source GDB driver which can utilise spatial indexing from these formats will directly benefit a huge number of QGIS users.

Proposed Solution

The result of the reverse-engineering of the FileGDB format, as currently implemented by the GDAL OpenFileGDB driver, is published at https://github.com/rouault/dump_gdbtable/wiki/FGDB-Spec .
Thanks to a recent collaboration with Nyall Dawson, we have made significant progress in understanding the structure and content of the spatial index file (.spx) attached to a spatial layer. We found that FileGDB spatial indexing relies on rasterizing feature geometries on a grid with a constant spacing, dependent on the density of features. This spacing is provided in a header field of each vector layer. Depending on how the layer is created, up to 3 grid of different resolutions can be generated, providing a multi-resolution spatial indexing.
As a result of this work, we will update the above specification, and implement the decoding and use of the .spx files in the OpenFileGDB driver, so that spatial filter requests by bounding box issued on a OGR layer (SetSpatialFilter() method) will be significantly sped up, especially on large layers.

The implementation work will be done purely on the GDAL side. QGIS will automatically benefit from this improvement, when running against the improved GDAL version.

Note: handling of compressed FileGDB datasets (.cdf), out of scope of the OpenFileGDB driver, will remain out of scope.

Backwards Compatibility

New functionality with no impact on backwards compatibility

Further Considerations/Improvements

Once that work would have been completed, this would open the possibility of adding write capabilities to the OpenFileGDB driver, since the lack of understanding of the spatial index format was up to now a major stumbling point to create datasets that would be usable in practice. The current use of the FileGDB driver (relying on the proprietary SDK) for write operations could then be retired.

Issue Tracking ID(s)

None

Votes

(required)

That's great news!

For my information, is this QEP here to advertise this GDAL major improvement, or will there be anything to change in QGIS itself?

.. ok I think I get it, you prepare a grant proposal right? :)

will there be anything to change in QGIS itself?

no, as mentioned in the details ;-)

you prepare a grant proposal right? :)

yes, as labeled :-)

This will be a BIG helper to mixed-software environments, fingers crossed it goes ahead!

Personally, this is a huge win for a QGIS End-User since as you've mentioned, many data are provided only in ESRI-specific formats.

Professionally, as someone who works for a small/local government as their GIS Analyst, this is an even bigger win. Demonstrating flawless support for existing and future "industry-standard" data formats while taking advantage of QGIS' improved rendering performance and features would go a long way towards making a push to a FOSS stack palatable for any administration.

One of the repeated concerns that I've yet to allay has been the idea that "Everyone else uses ESRI, so we must as well, otherwise we're not compatible". I've tried explaining how in 99% of the cases we're currently compatible, but that 1% is where we stick.

This goes a very long way towards removing that stumbling block, and makes a very clear/easy path forward into a mixed FOSS/ESRI environment, allowing for slow roll-out without a hard cut-over.

Big support for this enhancement, especially if/when it leads to write support for vector FileGDBs. Having read/write support in QGIS out of the box (and other tools that use GDAL) will be a massive UX improvement.

Was implemented some time ago in GDAL master (3.2dev) per OSGeo/gdal#2771