opengeospatial / ogcapi-processes

Home Page:https://ogcapi.ogc.org/processes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Type indications for user interfaces in parameter schemas

m-mohr opened this issue · comments

We recently concluded Testbed 19 and had a work item where we tried to combine openEO and OGC API - Processes. I worked on a visual client https://m-mohr.github.io/gdc-web-editor/

What we realized is that the UI generation for processes exposed by OGC APIs is much harder as we only have limited indication by the JSON Schema. In openEO we started a list of "subtypes" that define common types of input, e.g. collection, bbox, date, duration, epsg-code, geojson, wkt2, year, etc. This helps to render much more user-friendly UI. While a bbox usually just provides a list of 4 numbers in the UI, it can now render a map where you can select a bbox. Just as an example. Some non-visual clients also benefit from it. See https://github.com/Open-EO/openeo-processes/blob/master/meta/subtype-schemas.json for a list of subtypes. This is open to extensions.

The subtype is a custom keyword to JSON Schema and has it's own meta-schema, so also validates in JSON Schema validators. While I think it might not be something for the spec itself, it certainly could go into a best practice.

I think this is something that OGC API - Processes currently doesn't have, so could be something to align between OGC API - Processes an openEO. We now have a client (GDC Web Editor) that can connect to both OGC API - Processes and openEO and I'd hope we see more in the future. Such subtypes would be greatly beneficial and make OGC APIs much more accessible to non-coders.

Example:
grafik

This is the process description for it:
https://github.com/Open-EO/openeo-processes/blob/master/load_collection.json
Search for the subtype properties in the parameter schemas.

The map, date and the band selection wouldn't be possible as such without subtypes. Only the date selection could be achieved somewhat through the format keyword.

Thoughts?

I'm curious about the decision around subtype.
Why not instead use $ref with a reference to a well established JSON-schema with #<definition-name> to pick the specific definition? Could also make use of the $id property of JSON-schema to identify a known type from a definition catalog.

My main point here is about having more specific type indications, not necessatily a specific "encoding". So whether it's called subtype, $ref, format or something else is not so important for me.

But to give some background:

  • We started initially with the format keyword, but back in the days there were discussions in JSON Schema to deprecate format, so we tried to avoid format and decided to define a new JSON Schema keyword with a meta-schema to validate it.
  • About $ref:
    1. In openEO we want to have processes to be self-contained because otherwise they are pretty must indigestible by web clients (unless you want to spawn like dozens of HTTP requests to resolve them all). One request to GET /processes was meant to provide you all information needed for process graph construction.
    2. Another point is that in draft-07 of JSON Schema (2019/2020 didn't exist at the time we defined openEO), $ref couldn't live alongside other properties, so that makes things more complicated and doesn't easily allow customizing/restricting subtypes. Yes, there's allOf, but for simplicity we try to avoid allOf, oneOf and anyOf as much as possible.
    3. I guess we could've implied specific types by the URL in the $ref, but honestly we also just didn't think about it. Probably mostly due to point 2.

I agree with having specific identifiers. Even for something as simple as bounding box, it should be made obvious to distinguish it from any other array of numbers. A naming authority should be used as well to provide more context, linking references, and definition details. In the of bounding box for example, I would love to have processes indicate something similar to:

"$id": "http://www.opengis.net/def/glossary/term/BoundingBox"

I like the "$id"/subtype/(or whatever we call it) idea and I am trying to figure out what to do to resolve this issue but I am confused about something.

In the example that @m-mohr cites, https://github.com/Open-EO/openeo-processes/blob/master/load_collection.json, I see things like this (for a bounding box in this case):

{
  "title": "Bounding Box",
  "type": "object",
  "subtype": "bounding-box",
  "required": [
    "west",
    "south",
    "east",
    "north"
  ],
  "properties": {
    "west": {
      "description": "West (lower left corner, coordinate axis 1).",
      "type": "number"
    },
    "south": {
      "description": "South (lower left corner, coordinate axis 2).",
      "type": "number"
    },
    "east": {
      "description": "East (upper right corner, coordinate axis 1).",
      "type": "number"
    },
    "north": {
      "description": "North (upper right corner, coordinate axis 2).",
      "type": "number"
    },
    "base": {
      "description": "Base (optional, lower left corner, coordinate axis 3).",
      "type": [
        "number",
        "null"
      ],
      "default": null
    },
    "height": {
      "description": "Height (optional, upper right corner, coordinate axis 3).",
      "type": [
        "number",
        "null"
      ],
      "default": null
    },
    "crs": {
      "description": "Coordinate reference system of the extent, specified as as [EPSG code](http://www.epsg-registry.org/) or [WKT2 CRS string](http://docs.opengeospatial.org/is/18-010r7/18-010r7.html). Defaults to `4326` (EPSG code 4326) unless the client explicitly requests a different coordinate reference system.",
      "anyOf": [
        {
          "title": "EPSG Code",
          "type": "integer",
          "subtype": "epsg-code",
          "minimum": 1000,
          "examples": [
            3857
          ]
        },
        {
          "title": "WKT2",
          "type": "string",
          "subtype": "wkt2-definition"
        }
      ],
      "default": 4326
    }
  }
},

If there is an "$id" or "subtype" defined for this, why would I need ALL this schema? Why would the input's schema not just be (using "subType" as the identifier token):

{
  "subtype": "bounding-box"
}

Presumably the identifier "bounding-box" would imply ALL the rest. No?

@pvretano That has three main reasons for openEO, but is a design decision that can be decided differently in OAP:

  1. We didn't want to force implementations to have clients to resolve external references. See also #395 (comment) . (Similarly, we recommend for /queryables that servers resolve $refs before sending them to the clients.)
  2. We wanted that implementers can adapt their implementations based on their capabilities , e.g. not support the third dimension or not support WKT2 by removing these parts from the schema. It's then still pretty much the same base-schema, but clients can read from the schema what is missing. This issue occurs more often in openEO compared to OAP because of high number of pre-defined processes.
  3. Lastly, openEO clients still mostly use the non-subtype schema for making sense of the schema, but in some cases you just need a separate hint to indicate how the UI is meant to be rendered. So this is more an additive thing rather than the foundation. Starting with only the subtype makes it pretty foundational, which was never the purpose.

@m-mohr thanks for that. I get it. So, rest of the SWG. Would you prefer the "subtype" approach used by OpenEO or the "$id" approach proposed by @fmigneault? Personally I have no strong preference one way or the other but closer alignment between OpenEO and OAProc would be nice. Please make your preferences known.

SWG Meeting 2024-04029: Leaning toward format since it is already employed for similar use cases in https://docs.ogc.org/is/18-062r2/18-062r2.html#toc36 (see table 13), but more discussion needed.

Two points for consideration for the format:

  • JSON Schema doesn't clearly say yet whether format only works for "string"-types properties yet (see also json-schema-org/website#187 )
  • Some validators fail if you provide them unknown formats, which is not quite the intention here, I think. If something is unknown it should just be ignored.

@m-mohr

I do not think that your points are arguments against the use of "format".

Regarding

JSON Schema doesn't clearly say yet whether format only works for "string"-types properties yet

That statement is at least outdated. JSON Schema Validation 2020-12 is pretty clear:

All format attributes defined in this section apply to strings, but a format attribute can be specified to apply to any instance types defined in the data model defined in the core JSON Schema.

Regarding:

Some validators fail if you provide them unknown formats, which is not quite the intention here, I think. If something is unknown it should just be ignored.

I have never used such a validator, all that I have used (Java and JavaScript implementations) do not show this behavior. A question is how important are those validators - also given that they do not properly implement JSON Schema Validation 2020-12 (asserting format is optional and has to be disabled by default):

Implementations MAY still treat "format" as an assertion in addition to an annotation and attempt to validate the value's conformance to the specified semantics. The implementation MUST provide options to enable and disable such evaluation and MUST be disabled by default. Implementations SHOULD document their level of support for such validation.

That said, a new annotation is of course also an option.

If a new annotation is used, the keyword should start with "x-". Or probably "x-ogc-" as we have used in OGC API Features Part 5, Schemas, to reduce the risk of keyword name clashes.

From the current JSON Schema plans:

In order to support future-compatibility, keywords which are not known by the implementation MUST be disallowed.

The keyword prefix x- defines a safe space for users to introduce custom annotations without the need for an explicit custom keyword.

Implementations MUST refuse to process schemas which contain unknown keywords.

Yeah, subtype in openEO was defined before the x- recommendation was in place, format was debated to be removed and when ajv (the primary JS validator) still errored for unknown formats. Good that this is not the case anymore. We actually were on format before as well, but were worried about a potential removal of format so went with a separate keyword. So then it is probably not as relevant anymore and format seems fine (assuming we are using the new drafts, openEO is still on draft-07).

I agree with @cportele about the points. If implementations misbehave with format, it's up to them to fix their code. Format is a properly defined field with the exact purpose we are looking for:

Structural validation alone may be insufficient to allow an application to correctly utilize certain values. The "format" annotation keyword is defined to allow schema authors to convey semantic information for a fixed subset of values which are accurately described by authoritative resources, be they RFCs or other external specifications.
https://json-schema.org/draft/2020-12/json-schema-validation#name-foreword

For that same reason, I would rather use format than yet another custom field. If we add some x-ogc- field, implementations will have to look under many locations instead of using format's behavior that is already described in JSON schema and the OAP specification.

In JSON-FG we also use ajv to validate all examples. This is done without asserting "format", but treating it as the annotation that it now is.

I have looked at ajv and indeed it seems to not strictly conform to the JSON Schema spec, since you have to explicitly state validateFormats: false (i.e., false is not the default as required by the spec). And "by default unknown formats throw exception during schema compilation." So, ajv should only be used with validateFormats: false - or alternatively with proper configuration for all the format values.

In general, I always disable "format" validation when validating JSON instances. The behavior is too different across implementations. It should be handled as an annotation, which it now is in the spec.

Took a look at OpenEO and they have the following subtypes defined:

  • "band-name"
  • "bounding-box"
  • "chunk-size"
  • "collection-id"
  • "datacube"
  • "epsg-code"
  • "file-path"
  • "file-paths"
  • "geojson"
  • "input-format"
  • "input-format-options"
  • "kernel"
  • "labeled-array"
  • "metadata-filter"
  • "output-format"
  • "output-format-options"
  • "process-graph"
  • "raster-cube"
  • "temporal-interval"
  • "temporal-intervals"
  • "udf-code"
  • "udf-runtime"
  • "udf-runtime-version"
  • "vector-cube"
  • "wkt2-definition"
  • "year

The also have date-time, date, time duration and uri defined which seem to be duplicates of the values defined for the JSON Schema format parameter.

Assuming that we intend to use the format parameter to provide "subtype" hints we would need to expand "Table 15 — Additional values for the JSON schema format key for OGC Process Description" with additional values and probably meta-schemas (like OpenEO does).

So my question is, which additional values should we add? Of do we need to add any values at all and instead simply have some informative guidance indicating the the format parameter can be used to provide subtype hints and if you use it define a vocabulary ... or both (i.e. define some minimal set of values AND provide informative guidance).

Looking at the OpenEO list, some of these are purely OpenEO specific (e.g. udf-code, udf-runtime, ufd-runtime-version) but others seem pretty generic.

I await your feedback.

Many of them are indeed pretty openEO specific and evolve from the specific usecases and process definitions.
It would probably make sense to look at process definitions of OAP and see what is commonly used.

What could probably make sense is

  • date-time / date / year / duration / temporal-interval
  • epsg-code / wkt2-defintiion / ...
  • bounding-box
  • geojson
  • an OGC API adapted equivalent for collection-id
  • an OGC API equivalent for metadata-filter (i.e. for CQL2 Text and/or JSON)
  • ...

Generally, maybe this should be more a best practice rather than a standard so that it can evolve more agile. The standards can link to it though.

Just as a note: I found x-ogc-role in Feature - Part 5, which seems to have a very similar / the same purpose compared to what was proposed here.

@m-mohr one issue with using Part 5 is that Part 5 deals with "logical" schemas. The use of x-ogc-role is to tag each property with a role not a type. That is, property "X" is the "id" (i.e. the primary identifier) and property "Y" is the "primary-instant" (temporally), etc. It is more like schema constraints in SQL than some sort of type indication. Property "X" and property "Y" can be any type at all.

Is this what you are looking for? If yes, then we can adopt x-ogc-role. If not then I would propose we extend what we already have which is the JSON-Schema format tag.

Seems I misunderstood the x-ogc-role. What I intended to propose here seems closer to format then.

Generally, I found part 5 pretty confusing, maybe because it mixes concerns...

SWG meeting from 2024-07-22: We agreed to use the format-element. Please comment on additional types that you would like to see included. In the SWG we discussed adding extended collections, {map, coverage,...}, code list annotations and annotations for WKT representations.

@pvretano
A couple of other types I thought about was...

  1. Boolean
  2. Object (or group) - where an input required multiple inputs like a geometry and CRS
    We have used this library for our stuff up to this point - I'm not suggesting we use it but rather I thought we might get some ideas from it: https://rjsf-team.github.io/react-jsonschema-form/docs/api-reference/uiSchema/

@sptillma
Is there a case where "Boolean" cannot be handled by the type: boolean directly?
If this is referring to some string boolean-like value, such as "TRUE", "OK", "YES", "NO", 1, 0, then a more explicit schema using enum sounds more effective.

Is there some specific geometry you have in mind?
There is currently ogc-bbox, and a few other variants like geojson-feature-collection for more specific structures (https://docs.ogc.org/DRAFTS/18-062.html#_rec_ogc-process-description_format-key).

I don't think it is a good idea to have format: object or format: group, since that is not really more useful than type: {}. It's a "catch-all" definition that doesn't inform more about what is expected for that input.

Please comment on additional types that you would like to see included.

See #395 (comment)