Type indications for user interfaces in parameter schemas

Question

Type indications for user interfaces in parameter schemas

m-mohr opened this issue 6 months ago · comments

We recently concluded Testbed 19 and had a work item where we tried to combine openEO and OGC API - Processes. I worked on a visual client https://m-mohr.github.io/gdc-web-editor/

What we realized is that the UI generation for processes exposed by OGC APIs is much harder as we only have limited indication by the JSON Schema. In openEO we started a list of "subtypes" that define common types of input, e.g. collection, bbox, date, duration, epsg-code, geojson, wkt2, year, etc. This helps to render much more user-friendly UI. While a bbox usually just provides a list of 4 numbers in the UI, it can now render a map where you can select a bbox. Just as an example. Some non-visual clients also benefit from it. See https://github.com/Open-EO/openeo-processes/blob/master/meta/subtype-schemas.json for a list of subtypes. This is open to extensions.

The subtype is a custom keyword to JSON Schema and has it's own meta-schema, so also validates in JSON Schema validators. While I think it might not be something for the spec itself, it certainly could go into a best practice.

I think this is something that OGC API - Processes currently doesn't have, so could be something to align between OGC API - Processes an openEO. We now have a client (GDC Web Editor) that can connect to both OGC API - Processes and openEO and I'd hope we see more in the future. Such subtypes would be greatly beneficial and make OGC APIs much more accessible to non-coders.

Example:

This is the process description for it:
https://github.com/Open-EO/openeo-processes/blob/master/load_collection.json
Search for the subtype properties in the parameter schemas.

The map, date and the band selection wouldn't be possible as such without subtypes. Only the date selection could be achieved somewhat through the format keyword.

Thoughts?

Francis Charette-Migneault · Answer 1 · Fri Feb 16 2024 01:22:28 GMT+0800 (China Standard Time)

I'm curious about the decision around subtype.
Why not instead use $ref with a reference to a well established JSON-schema with #<definition-name> to pick the specific definition? Could also make use of the $id property of JSON-schema to identify a known type from a definition catalog.

Matthias Mohr · Answer 2 · Fri Feb 16 2024 01:51:49 GMT+0800 (China Standard Time)

My main point here is about having more specific type indications, not necessatily a specific "encoding". So whether it's called subtype, $ref, format or something else is not so important for me.

But to give some background:

We started initially with the format keyword, but back in the days there were discussions in JSON Schema to deprecate format, so we tried to avoid format and decided to define a new JSON Schema keyword with a meta-schema to validate it.
About $ref:
1. In openEO we want to have processes to be self-contained because otherwise they are pretty must indigestible by web clients (unless you want to spawn like dozens of HTTP requests to resolve them all). One request to GET /processes was meant to provide you all information needed for process graph construction.
2. Another point is that in draft-07 of JSON Schema (2019/2020 didn't exist at the time we defined openEO), $ref couldn't live alongside other properties, so that makes things more complicated and doesn't easily allow customizing/restricting subtypes. Yes, there's allOf, but for simplicity we try to avoid allOf, oneOf and anyOf as much as possible.
3. I guess we could've implied specific types by the URL in the $ref, but honestly we also just didn't think about it. Probably mostly due to point 2.

Francis Charette-Migneault · Answer 3 · Fri Feb 16 2024 02:58:49 GMT+0800 (China Standard Time)

I agree with having specific identifiers. Even for something as simple as bounding box, it should be made obvious to distinguish it from any other array of numbers. A naming authority should be used as well to provide more context, linking references, and definition details. In the of bounding box for example, I would love to have processes indicate something similar to:

"$id": "http://www.opengis.net/def/glossary/term/BoundingBox"

Panagiotis (Peter) A. Vretanos · Answer 4 · Mon Apr 29 2024 06:37:22 GMT+0800 (China Standard Time)

I like the "$id"/subtype/(or whatever we call it) idea and I am trying to figure out what to do to resolve this issue but I am confused about something.

In the example that @m-mohr cites, https://github.com/Open-EO/openeo-processes/blob/master/load_collection.json, I see things like this (for a bounding box in this case):

{
  "title": "Bounding Box",
  "type": "object",
  "subtype": "bounding-box",
  "required": [
    "west",
    "south",
    "east",
    "north"
  ],
  "properties": {
    "west": {
      "description": "West (lower left corner, coordinate axis 1).",
      "type": "number"
    },
    "south": {
      "description": "South (lower left corner, coordinate axis 2).",
      "type": "number"
    },
    "east": {
      "description": "East (upper right corner, coordinate axis 1).",
      "type": "number"
    },
    "north": {
      "description": "North (upper right corner, coordinate axis 2).",
      "type": "number"
    },
    "base": {
      "description": "Base (optional, lower left corner, coordinate axis 3).",
      "type": [
        "number",
        "null"
      ],
      "default": null
    },
    "height": {
      "description": "Height (optional, upper right corner, coordinate axis 3).",
      "type": [
        "number",
        "null"
      ],
      "default": null
    },
    "crs": {
      "description": "Coordinate reference system of the extent, specified as as [EPSG code](http://www.epsg-registry.org/) or [WKT2 CRS string](http://docs.opengeospatial.org/is/18-010r7/18-010r7.html). Defaults to `4326` (EPSG code 4326) unless the client explicitly requests a different coordinate reference system.",
      "anyOf": [
        {
          "title": "EPSG Code",
          "type": "integer",
          "subtype": "epsg-code",
          "minimum": 1000,
          "examples": [
            3857
          ]
        },
        {
          "title": "WKT2",
          "type": "string",
          "subtype": "wkt2-definition"
        }
      ],
      "default": 4326
    }
  }
},

If there is an "$id" or "subtype" defined for this, why would I need ALL this schema? Why would the input's schema not just be (using "subType" as the identifier token):

{
  "subtype": "bounding-box"
}

Presumably the identifier "bounding-box" would imply ALL the rest. No?

Matthias Mohr · Answer 5 · Mon Apr 29 2024 17:16:12 GMT+0800 (China Standard Time)

@pvretano That has three main reasons for openEO, but is a design decision that can be decided differently in OAP:

We didn't want to force implementations to have clients to resolve external references. See also #395 (comment) . (Similarly, we recommend for /queryables that servers resolve $refs before sending them to the clients.)
We wanted that implementers can adapt their implementations based on their capabilities , e.g. not support the third dimension or not support WKT2 by removing these parts from the schema. It's then still pretty much the same base-schema, but clients can read from the schema what is missing. This issue occurs more often in openEO compared to OAP because of high number of pre-defined processes.
Lastly, openEO clients still mostly use the non-subtype schema for making sense of the schema, but in some cases you just need a separate hint to indicate how the UI is meant to be rendered. So this is more an additive thing rather than the foundation. Starting with only the subtype makes it pretty foundational, which was never the purpose.

Panagiotis (Peter) A. Vretanos · Answer 6 · Mon Apr 29 2024 19:53:36 GMT+0800 (China Standard Time)

@m-mohr thanks for that. I get it. So, rest of the SWG. Would you prefer the "subtype" approach used by OpenEO or the "$id" approach proposed by @fmigneault? Personally I have no strong preference one way or the other but closer alignment between OpenEO and OAProc would be nice. Please make your preferences known.

Francis Charette-Migneault · Answer 7 · Mon Apr 29 2024 22:09:03 GMT+0800 (China Standard Time)

SWG Meeting 2024-04029: Leaning toward format since it is already employed for similar use cases in https://docs.ogc.org/is/18-062r2/18-062r2.html#toc36 (see table 13), but more discussion needed.

Matthias Mohr · Answer 8 · Mon Apr 29 2024 22:37:52 GMT+0800 (China Standard Time)

Two points for consideration for the format:

JSON Schema doesn't clearly say yet whether format only works for "string"-types properties yet (see also json-schema-org/website#187 )
Some validators fail if you provide them unknown formats, which is not quite the intention here, I think. If something is unknown it should just be ignored.

Clemens Portele · Answer 9 · Tue Apr 30 2024 00:08:55 GMT+0800 (China Standard Time)

@m-mohr

I do not think that your points are arguments against the use of "format".

Regarding

JSON Schema doesn't clearly say yet whether format only works for "string"-types properties yet

That statement is at least outdated. JSON Schema Validation 2020-12 is pretty clear:

All format attributes defined in this section apply to strings, but a format attribute can be specified to apply to any instance types defined in the data model defined in the core JSON Schema.

Regarding:

Some validators fail if you provide them unknown formats, which is not quite the intention here, I think. If something is unknown it should just be ignored.

I have never used such a validator, all that I have used (Java and JavaScript implementations) do not show this behavior. A question is how important are those validators - also given that they do not properly implement JSON Schema Validation 2020-12 (asserting format is optional and has to be disabled by default):

Implementations MAY still treat "format" as an assertion in addition to an annotation and attempt to validate the value's conformance to the specified semantics. The implementation MUST provide options to enable and disable such evaluation and MUST be disabled by default. Implementations SHOULD document their level of support for such validation.

That said, a new annotation is of course also an option.

If a new annotation is used, the keyword should start with "x-". Or probably "x-ogc-" as we have used in OGC API Features Part 5, Schemas, to reduce the risk of keyword name clashes.

From the current JSON Schema plans:

In order to support future-compatibility, keywords which are not known by the implementation MUST be disallowed.

The keyword prefix x- defines a safe space for users to introduce custom annotations without the need for an explicit custom keyword.

Implementations MUST refuse to process schemas which contain unknown keywords.

Matthias Mohr · Answer 10 · Tue Apr 30 2024 00:11:33 GMT+0800 (China Standard Time)

Yeah, subtype in openEO was defined before the x- recommendation was in place, format was debated to be removed and when ajv (the primary JS validator) still errored for unknown formats. Good that this is not the case anymore. We actually were on format before as well, but were worried about a potential removal of format so went with a separate keyword. So then it is probably not as relevant anymore and format seems fine (assuming we are using the new drafts, openEO is still on draft-07).

Francis Charette-Migneault · Answer 11 · Tue Apr 30 2024 00:18:47 GMT+0800 (China Standard Time)

I agree with @cportele about the points. If implementations misbehave with format, it's up to them to fix their code. Format is a properly defined field with the exact purpose we are looking for:

Structural validation alone may be insufficient to allow an application to correctly utilize certain values. The "format" annotation keyword is defined to allow schema authors to convey semantic information for a fixed subset of values which are accurately described by authoritative resources, be they RFCs or other external specifications.
https://json-schema.org/draft/2020-12/json-schema-validation#name-foreword

For that same reason, I would rather use format than yet another custom field. If we add some x-ogc- field, implementations will have to look under many locations instead of using format's behavior that is already described in JSON schema and the OAP specification.

Clemens Portele · Answer 12 · Tue Apr 30 2024 14:30:53 GMT+0800 (China Standard Time)

In JSON-FG we also use ajv to validate all examples. This is done without asserting "format", but treating it as the annotation that it now is.

I have looked at ajv and indeed it seems to not strictly conform to the JSON Schema spec, since you have to explicitly state validateFormats: false (i.e., false is not the default as required by the spec). And "by default unknown formats throw exception during schema compilation." So, ajv should only be used with validateFormats: false - or alternatively with proper configuration for all the format values.

In general, I always disable "format" validation when validating JSON instances. The behavior is too different across implementations. It should be handled as an annotation, which it now is in the spec.

Panagiotis (Peter) A. Vretanos · Answer 13 · Mon May 27 2024 20:48:31 GMT+0800 (China Standard Time)

Took a look at OpenEO and they have the following subtypes defined:

"band-name"
"bounding-box"
"chunk-size"
"collection-id"
"datacube"
"epsg-code"
"file-path"
"file-paths"
"geojson"
"input-format"
"input-format-options"
"kernel"
"labeled-array"
"metadata-filter"
"output-format"
"output-format-options"
"process-graph"
"raster-cube"
"temporal-interval"
"temporal-intervals"
"udf-code"
"udf-runtime"
"udf-runtime-version"
"vector-cube"
"wkt2-definition"
"year

The also have date-time, date, time duration and uri defined which seem to be duplicates of the values defined for the JSON Schema format parameter.

Assuming that we intend to use the format parameter to provide "subtype" hints we would need to expand "Table 15 — Additional values for the JSON schema format key for OGC Process Description" with additional values and probably meta-schemas (like OpenEO does).

So my question is, which additional values should we add? Of do we need to add any values at all and instead simply have some informative guidance indicating the the format parameter can be used to provide subtype hints and if you use it define a vocabulary ... or both (i.e. define some minimal set of values AND provide informative guidance).

Looking at the OpenEO list, some of these are purely OpenEO specific (e.g. udf-code, udf-runtime, ufd-runtime-version) but others seem pretty generic.

I await your feedback.

Matthias Mohr · Answer 14 · Mon May 27 2024 22:57:53 GMT+0800 (China Standard Time)

Many of them are indeed pretty openEO specific and evolve from the specific usecases and process definitions.
It would probably make sense to look at process definitions of OAP and see what is commonly used.

What could probably make sense is

date-time / date / year / duration / temporal-interval
epsg-code / wkt2-defintiion / ...
bounding-box
geojson
an OGC API adapted equivalent for collection-id
an OGC API equivalent for metadata-filter (i.e. for CQL2 Text and/or JSON)
...

Generally, maybe this should be more a best practice rather than a standard so that it can evolve more agile. The standards can link to it though.

Matthias Mohr · Answer 15 · Tue Jul 16 2024 18:49:11 GMT+0800 (China Standard Time)

Just as a note: I found x-ogc-role in Feature - Part 5, which seems to have a very similar / the same purpose compared to what was proposed here.

Panagiotis (Peter) A. Vretanos · Answer 16 · Mon Jul 22 2024 20:00:50 GMT+0800 (China Standard Time)

@m-mohr one issue with using Part 5 is that Part 5 deals with "logical" schemas. The use of x-ogc-role is to tag each property with a role not a type. That is, property "X" is the "id" (i.e. the primary identifier) and property "Y" is the "primary-instant" (temporally), etc. It is more like schema constraints in SQL than some sort of type indication. Property "X" and property "Y" can be any type at all.

Is this what you are looking for? If yes, then we can adopt x-ogc-role. If not then I would propose we extend what we already have which is the JSON-Schema format tag.

Matthias Mohr · Answer 17 · Mon Jul 22 2024 20:15:01 GMT+0800 (China Standard Time)

Seems I misunderstood the x-ogc-role. What I intended to propose here seems closer to format then.

Generally, I found part 5 pretty confusing, maybe because it mixes concerns...

Benjamin Pross · Answer 18 · Mon Jul 22 2024 21:50:02 GMT+0800 (China Standard Time)

SWG meeting from 2024-07-22: We agreed to use the format-element. Please comment on additional types that you would like to see included. In the SWG we discussed adding extended collections, {map, coverage,...}, code list annotations and annotations for WKT representations.

sptillma · Answer 19 · Tue Jul 23 2024 02:01:11 GMT+0800 (China Standard Time)

@pvretano
A couple of other types I thought about was...

Boolean
Object (or group) - where an input required multiple inputs like a geometry and CRS
We have used this library for our stuff up to this point - I'm not suggesting we use it but rather I thought we might get some ideas from it: https://rjsf-team.github.io/react-jsonschema-form/docs/api-reference/uiSchema/

Francis Charette-Migneault · Answer 20 · Tue Jul 23 2024 03:54:24 GMT+0800 (China Standard Time)

@sptillma
Is there a case where "Boolean" cannot be handled by the type: boolean directly?
If this is referring to some string boolean-like value, such as "TRUE", "OK", "YES", "NO", 1, 0, then a more explicit schema using enum sounds more effective.

Is there some specific geometry you have in mind?
There is currently ogc-bbox, and a few other variants like geojson-feature-collection for more specific structures (https://docs.ogc.org/DRAFTS/18-062.html#_rec_ogc-process-description_format-key).

I don't think it is a good idea to have format: object or format: group, since that is not really more useful than type: {}. It's a "catch-all" definition that doesn't inform more about what is expected for that input.

Matthias Mohr · Answer 21 · Tue Jul 23 2024 04:58:17 GMT+0800 (China Standard Time)

Please comment on additional types that you would like to see included.

See #395 (comment)