Permissively deserialize invalid temporal extents
giswqs opened this issue · comments
I have been using the following code to process the Maxar Open Data catalog for several months now. It has been working until today. Now it throws a ValueError: ISO string too short
. I am not sure if this a pystac or isoparser issue.
from pystac import Catalog
url = "https://maxar-opendata.s3.amazonaws.com/events/catalog.json"
root_catalog = Catalog.from_file(url)
collections = root_catalog.get_collections()
collections = [collection.id for collection in collections]
Looks like (at least) one of their collections has invalid temporal extents, and needs to be corrected on their side:
$ curl -s https://maxar-opendata.s3.amazonaws.com/events/BayofBengal-Cyclone-Mocha-May-23/collection.json | jq .extent.temporal.interval
[
"2023-01-03 04:30:17Z",
"2023-05-22 04:35:25Z"
]
.extent.temporal.interval
should be a list of lists: https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md#temporal-extent-object
That being said, this is a common problem. On deserialization, we should probably permissively correct the problem with a warning. Leaving this open to track that need.
@giswqs just checked your test script against #1222 and looks like it's a fix:
$ cat > test.py
from pystac import Catalog
url = "https://maxar-opendata.s3.amazonaws.com/events/catalog.json"
root_catalog = Catalog.from_file(url)
collections = root_catalog.get_collections()
collections = [collection.id for collection in collections]
$ python test.py
/Users/gadomski/Code/stac-utils/pystac/pystac/collection.py:264: UserWarning: A collection's temporal extent should be a list of lists, but is instead a list of strings. pystac is fixing this issue and continuing deserialization, but note that the source collection is invalid STAC.
warnings.warn(
$
So you can work from that branch until we're able to release an update.
@gadomski Awesome! Thank you very much for the quick fix.