planetlabs / gpq

Utility for working with GeoParquet

Home Page:https://planetlabs.github.io/gpq/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

More info about the data?

cholmes opened this issue · comments

The new gpq validation is awesome, but it'd be nice if it was easy to get a few more bits of info:

  • Number of features
  • Geometry type
  • CRS info
  • Bounding box (I do see this in 'describe', but I think it'd be nice to be higher level to get a sense without having to see all the columns).

I could see two routes for this:

  1. Report them as you're doing validation. Like it'd say something in the data section about how many features it's validated to all have proper info. And then instead of just 'all geometry types must be included in geometry_types metadata' it could say 'geometry type metadata is Polygon, and all geometries are polygons'. And similar with bounding box - report the bounding box and report if all fall in it.

(This does highlight two potential 'warnings' - if the bbox reported is much bigger than the actual bounds of the geometry, and if the geometry types is more flexible than needed - like it isn't specified but all the data is actually Polygons. Ideally there'd be nice quick operations in gpq to fix this.

  1. Have an 'info' command like ogrinfo, that just reports on this info.

I think it could make sense to build on the existing describe command for some of this (reporting row count etc). The current output is JSON, but we could accept a —format argument and have text output as well (this could also be the default).

The warnings about an overly large bbox or a larger set of geometry types than used do make sense in the validator.

Cool, that'd make sense to me. Though ideally with an option to turn off the description of all the columns, sometimes there's a ton of them and I just want an overview of the info.

The default output format from the gpq describe command is now a more compact table (the --format json output includes more information for nested fields). This includes geometry types and bbox if it is present for geometry columns in the geo metadata. I didn't change anything about the validator output. Maybe we can open individual issues about adding some "best practices" rules.