stac-utils / pystac

Python library for working with any SpatioTemporal Asset Catalog (STAC)

Home Page:https://pystac.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Permissively deserialize invalid temporal extents

giswqs opened this issue · comments

I have been using the following code to process the Maxar Open Data catalog for several months now. It has been working until today. Now it throws a ValueError: ISO string too short. I am not sure if this a pystac or isoparser issue.

from pystac import Catalog
url = "https://maxar-opendata.s3.amazonaws.com/events/catalog.json"
root_catalog = Catalog.from_file(url)
collections = root_catalog.get_collections()
collections = [collection.id for collection in collections]

image

Looks like (at least) one of their collections has invalid temporal extents, and needs to be corrected on their side:

$ curl -s https://maxar-opendata.s3.amazonaws.com/events/BayofBengal-Cyclone-Mocha-May-23/collection.json | jq .extent.temporal.interval
[
  "2023-01-03 04:30:17Z",
  "2023-05-22 04:35:25Z"
]

.extent.temporal.interval should be a list of lists: https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md#temporal-extent-object

That being said, this is a common problem. On deserialization, we should probably permissively correct the problem with a warning. Leaving this open to track that need.

@giswqs just checked your test script against #1222 and looks like it's a fix:

$ cat > test.py
from pystac import Catalog
url = "https://maxar-opendata.s3.amazonaws.com/events/catalog.json"
root_catalog = Catalog.from_file(url)
collections = root_catalog.get_collections()
collections = [collection.id for collection in collections]
$ python test.py 
/Users/gadomski/Code/stac-utils/pystac/pystac/collection.py:264: UserWarning: A collection's temporal extent should be a list of lists, but is instead a list of strings. pystac is fixing this issue and continuing deserialization, but note that the source collection is invalid STAC.
  warnings.warn(
$

So you can work from that branch until we're able to release an update.

@gadomski Awesome! Thank you very much for the quick fix.