stac-utils / pystac

Python library for working with any SpatioTemporal Asset Catalog (STAC)

Home Page:https://pystac.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

get_child_links/get_item_links: Ensure correct media type

m-mohr opened this issue · comments

It looks like e.g get_child_links doesn't check the media type.

So if I have child links to STAC Catalogs/Collections and to HTML files (that render the child STAC entities), is it intentional that I also get the HTML files? I see that a lot in OGC API-based implementations.

Generally, does pystac support hierarchical links (child, item, self, parent) with types that are not STAC media types? I mean support in a sense that it doesn't screw up or doesn't throw an error (e.g. by ignoring them).

Generally, does pystac support hierarchical links (child, item, self, parent) with types that are not STAC media types? I mean support in a sense that it doesn't screw up or doesn't throw an error (e.g. by ignoring them).

Probably not, and this seems like strange/incorrect per the spec wording, which says that a child rel type should be "URL to a child STAC entity (Catalog or Collection)." So it would be surprising to me if a thing behind child was NOT a STAC entity.

It is a STAC entity, but in a different encoding (i.e. HTML).

I understood the STAC Spec as such that you should still have the corresponding media type to indicate that this thing is a STAC entity.

This all is pretty interesting when conbining STAC with other worlds, like Records, etc. Other things can be items or children in a hierarchical sense, IMHO. Taking the whole relation type just for us, seems like a bold claim.

Here's an example of such an implementation: https://api.weather.gc.ca/stac/?f=json

It is a STAC entity, but in a different encoding (i.e. HTML).

Is this defined somewhere? HTML in particular seems like a strange format for machine-readable STAC metadata. The general point though is taken -- I've encoded STAC metadata in TOML, e.g.

Should this issue then be "support other media types for structural links", with deserializers for any other formats that are out there?

Should this issue then be "support other media types for structural links", with deserializers for any other formats that are out there?

No, for me the primary issue is that pystac just should not falsely try to load STAC from a HTML page and error (If that's the case). Just handle ignore/pass through such links. Additional support for more file formats would be a different issue (and a stretch goal for the longer term), I think.

Is this defined somewhere?

In OGC API - Records it's recommended, which we try to align with. I don't think it's defined anywhere, but it's also not explicitly forbidden anywhere. And I think back in the days, Chris always asked us to have HTML representations alongside JSON to allows crawling in Google etc. Thus, I think this is a very reasonable use case. (Also added a bit more context in the radiantearth/stac-spec#1259)

Regarding the implementation: What I do in STAC Browser is to check the following:

let stacTypes = ['application/geo+json', 'application/json'];
let stacItems = stac.links.filter(link => link.rel === 'items' && (!link.type || stacTypes.includes(link.type)));

That seems to work with all implementations I've encountered so far. It ensures that it has the correct media type, but also assumes that no media type in a STAC context means it's a STAC.

In terms of it being defined anywhere, it is, the base of all OGC API specifications is called "OGC API Commons" and has a HTML requirement class.

While optional, its implementation is recommended, quoting:

Therefore, sharing data on the Web should include publication in HTML. To be consistent with the Web, this publication should be done in a way that enables users and search engines to discover and access all of the data.
This is discussed in detail in the W3C/OGC SDW Best Practice. Therefore, the OGC API — Common Standard recommends supporting HTML as an encoding.