stac-utils / pystac

Python library for working with any SpatioTemporal Asset Catalog (STAC)

Home Page:https://pystac.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SpatialExtent value type checking allows creation of invalid Collections

bmcandr opened this issue · comments

The type hints in the class constructor of SpatialExtent indicate that either List[float] | List[List[float]] are expected:

bboxes: list[list[float]] | list[float],

But there is minimal validation around this which allows for the creation Collections with malformed SpatialExtents:

# A common mistake is to pass in a single bbox instead of a list of bboxes.
# Account for this by transforming the input in that case.
if isinstance(bboxes, list) and isinstance(bboxes[0], float):
self.bboxes: list[list[float]] = [cast(list[float], bboxes)]
else:
self.bboxes = cast(list[list[float]], bboxes)

For example, a bbox containing a list of integers will not be cast to a nested list because the isinstance(bboxes[0], float) type check fails. While we should be obeying the type hints, we have inadvertently created Collections that fail validation this way.

The STAC spec indicates the type for bbox is [[number]] which is either an integer or float so it seems like the "validation" here should properly handle a list of integers as well.

Is there any appetite for more robust type-checking here? Properly handle float/int, fail for other types? If so, I'm happy to put together a merge request!

Reopened after reading a bit more about typing.cast. I understand that it is used to satisfy type-checkers and does not actually alter the value passed into it.

typing.cast(typ, val)
Cast a value to a type.

This returns the value unchanged. To the type checker this signals that the return value has the designated type, but at runtime we intentionally don’t check anything (we want this to be as fast as possible).

However, the comment in the snippet above suggests the logic is intended to "transform" the bboxes into a list of bboxes if not supplied in that format. Is the code truth or the comment? If the latter (my bet), then I think my comment still stands and this code is not working as intended.

Yes casting is pretty confusing. I agree that it seems like the commenter intended to alter the type of the input. More robust coercion to the correct type would be great. PRs welcome!