qgis / QGIS-Enhancement-Proposals

QEP's (QGIS Enhancement Proposals) are used in the process of creating and discussing new enhancements for QGIS

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Rework handling of multi-layer, mixed-format datasets

nyalldawson opened this issue · comments

QGIS Enhancement: Rework handling of multi-layer datasets

Date 2021/03/19

Author Nyall Dawson (@nyalldawson)

Contact nyall dot dawson at gmail dot com

Version QGIS 3.20 or 3.22

Summary

Many common spatial data formats support the storage of multiple layers of data. Furthermore, many of these formats allow for storing different types of layers within a single dataset, e.g. storing both raster and vector layers in a single file. Commonly encountered formats which support this include:

  • geopackage (mix of vector and raster layers)
  • kml/kmz (mix of vector and raster layers)
  • geopdf (mix of vector and raster layers)
  • netcdf (mix of raster and mesh datasets)

Currently, QGIS has poor support for these mixed layer-type data formats. Some of the issues in current versions include:

  • Dragging and dropping a geopackage containing both vector and raster datasets onto QGIS results in two separate dialogs opening prompting the user first which raster layers to add, then which vector layers to add:

Peek 2021-03-19 08-53

  • Dragging and dropping other mixed-format datasets onto QGIS only results in a single "select layers" dialog showing one layer type (eg raster OR vector), and users never get the option to add the other layer types. (The only solution is to add layers through one of the specific Data Source Manager tabs).
  • Datasets appear multiple times in the browser panel (once per available layer type), causing clutter and user confusion:
    image
  • Some formats (eg. KML/KMZ) only appear a single time in the browser panel as one of the available layer types. E.g. KML files only show in the browser panel as vector layers, and users have no way of knowing that the datasets also contain valid raster layers.

It is important to note that gpkg currently has generally quite good support for mixed formats in QGIS, but this is due to many hard-coded workarounds added for the geopackage format only, which can't be extended to other data formats.

Furthermore, the situation is complicated because the current QGIS API for handling sublayers inside a dataset is very old and extremely limited. The API is also very inefficient, e.g. it requires a raster or vector layer to be fully constructed before the full list of sublayers can be retrieved, only for this layer to be discarded and the actual desired sublayer opened, resulting in unnecessary work and network/disk usage. The API is also unfriendly for third party scripts and plugins to reuse for their own purposes.

Proposed Solution

This project consists of two components:

  1. Reworking the QGIS API for handling multi-layer datasets, in order to make it more flexible, stable, efficient, and easy to use.
  2. Reworking the QGIS UI for exposing multi-layer datasets to users to utilise the new API and provide an optimal user experience regardless of the data format.

Proposed API

A new struct/data class will be created to provide a stable and structured way of storing sub layer details. (The current API uses poor quality hacks like returning a list of strings corresponding to sublayers, where each string consists of a mix of layer name, data type, description and other components all delimited by a special "!!::!!" separator). E.g.

class QgsSublayerDetails
{
   public:

      //! Associated data provider key
      QString providerKey;

      //! Type of layer
      QgsMapLayerType type;

      //! Layer name
      QString name;

      //! Layer description
      QString description;

      //! Feature count, for vector layers only
      long featureCount;

      //! Geometry column name, for vector layers only
      QString geometryColumnName;
  
       ... etc ....

};

The QgsProviderMetadata class will gain a new virtual method allowing the corresponding provider to query a URI and return a list of any valid sublayers contained in the dataset which that provider can handle.

/**
 * Queries the specified \a uri and returns a list of any valid sublayers found in the dataset which can be handled by this provider.
*/
virtual QList< QgsSublayerDetails > querySublayers( const QString& uri );

Individual providers will be able to utilise whichever shortcuts apply to that specific provider for determining the list of sublayers they can open (WITHOUT the expense of creating a full QgsMapLayer object in order to do this). The method will initially be implemented for the OGR, GDAL and MDAL (mesh) data providers.

Lastly, the QgsProviderRegistry class will have a similar method for querying a URI for ALL registered dataproviders and collating a complete list of sublayers which can be handled by any data provider (e.,g. OGR, GDAL, MDAL, etc)

/**
 * Queries the specified \a uri and returns a list of any valid sublayers found in the dataset which can be handled by any registered data provider.
 *
 * This method iteratively queries each registered data provider and returns the complete collated list of all valid sublayers found in the dataset which can be opened by the data providers.
*/
virtual QList< QgsSublayerDetails > querySublayers( const QString& uri );

This API will be exposed to PyQGIS, allowing third party scripts and plugins a very easy to use, stable API for querying all valid sublayers in a dataset.

Proposed UI changes

  1. The current separate dialogs which are used to prompt users for raster and vector sublayers to add from a dataset will be reworked into a single unified sublayer selection dialog, which uses the newly added APIs to show users a complete list of ALL valid sublayers in the file, regardless of the data provider or layer type.

  2. The Browser panel code will be significantly reworked so that any file which contains multiple sublayers automatically shows as an expandable tree item, containing ALL the valid sublayers regardless of the data provider. This will be handled directly via the new API, and consequently will automatically apply for all data providers which can handle a particular file without hardcoded, provider-specific workarounds. (Furthermore it will also work correctly with any plugin-based data providers, providing them with a first-class integrated appearance!). This change will mean that all files only appear a single time in the browser panel, with users able to expand out the file to see ALL valid sublayers and then drag and drop these to add the layers as vector, raster, mesh, etc layers). Ultimately all data types will see the same first-rate browser user-experience as geopackage files have in current QGIS versions.

Much needed enhancement! I also strongly concur that the current way of deal with sublayers with magical delimiters is extremely fragile and error prone (a recent breakage in 3.18.0 occurred just because of that), and we definitively need a cleaner approach.

I can imagine that querySublayers() could accept arguments so that the user can specify which details it wants to know and/or a mode "get all details that are cheap to retrieve" and/or "approximate values are OK". Getting the feature count, or the geometry type (if a vector layer has mixed geometry type content, we often need to iterate over all its features to figure the actual geometries it holds), or geometry column name can be rather costly operations on some data sources, compared to just getting the layer name and type (raster, vector, ...)

I was wondering if we should have a provision for a future hiearchical presentation of layers. This would be really more for later usage, as I don't think many datasources can use that. One use case I have in mind is for KML, which is naturally hiearchical (also thinking the WMS capabilities also offer a hierarchical view. netCDF / HDF5 can also have a hiearchical structure). But at the GDAL / OGR level everything is flattened currently. But possibly if the GDAL data model was extended and the driver updated, this could flow into QGIS. Maybe add a QList<std::unique_ptr> children member to QgsSublayerDetails ?
querySublayers() could then return a single root QgsSublayerDetails object, and it would offer a QgsSublayerDetails::flatLeafLists() that would return a QList of all leaf nodes of the hiearchy ? This is just for the sake of brainstorming and let to the appreciation of the proponent. The idea here would be to just have the data structure ready for such a potential extension without a future API break, but UI or other parts of the application could be later updated to support that hierarchical presentation and just use the flatten view for now.

@rouault

I can imagine that querySublayers() could accept arguments so that the user can specify which details it wants to know and/or a mode "get all details that are cheap to retrieve" and/or "approximate values are OK". Getting the feature count, or the geometry type (if a vector layer has mixed geometry type content, we often need to iterate over all its features to figure the actual geometries it holds), or geometry column name can be rather costly operations on some data sources, compared to just getting the layer name and type (raster, vector, ...)

Great point. I'll add a Flags argument which defaults to no flags, with optional flags available for forced resolving the geometry type and counting features in situations where we know this will potentially be expensive.

I was wondering if we should have a provision for a future hiearchical presentation of layers.

Also a good point. I'd suggest we could do this very simply by just adding a QStringList "hierarchy" or "path" member to QgsSublayerDetails, and then leave this up to the caller to decide how the want to present this information (as a single formatted string or via a tree view of the schema directories).

Handling qgis projects saved within geopackages would also be great!