azavea / franklin

A STAC/OGC API Features Web Service

Home Page:https://azavea.github.io/franklin/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

STAC Search with "ids" and "fields" not working as specified in the STAC spec

adrienDog opened this issue · comments

Describe the bug
I am trying to use the advanced POST /search from the STAC API documentation here

Some queries are not behaving as specified:

  1. Using ids query field: seems not taken into account at all
  2. Using fields to specify which fields to include/exclude in each feature response

Expected behavior
Example advanced search:

curl --location --request POST 'mySTAC/search' \
--header 'Content-Type: application/json' \
--data-raw '{
    "ids": [ "id_1", "id_2"],
    "fields": {
        "include": ["id"],
        "exclude": ["geometry"]
    }
}'
  1. Returns the totality of features, across all collections.
  2. All fields are returned:
  • geometry included

Additional context

👋🏻 the fields behavior is specified in an extension that we've never implemented -- https://github.com/radiantearth/stac-api-spec/tree/master/item-search#fields. Can you say some more about how fields would be useful to you? It's always seemed kind of superfluous to me outside of bandwidth-constrained environments.

You're right about the item search though, and I'm getting a PR up to fix that now.

Hi James! Thanks for the quick reply :)

The fields, you are right, is about saving bandwidth on big result lists. Not critical and we wont use now. I just meant to use it to be quick at checking whether the id I was requesting was in the result list.

For the ids query, I see from the specs that if set:

All other filter parameters that further restrict the number of search results are ignored

Which will mean for example that specifying collections will have no effect?
Example:

  • if id_1 is in collection_a, and we query for
{
  "collections": ["collection_b"], 
  "ids": ["id_1"]
}

then id_1 will be returned anyway

Thanks for addressing this one! important to us when there is some client side grouping which does not fit the stac model

All other filter parameters that further restrict the number of search results are ignored

I didn't notice this before. The next big spec push is around the API spec, sometime in the next few months, so there's time to refine how that works if the currently described behavior isn't great. I'd prefer to apply whatever filters a client provides (Franklin's current behavior) instead of requiring consumers to understand the spec well enough to know that some filters won't be applied under certain conditions. What do you think?

Closed as fixed, but we can continue talking about the filter behavior here -- I'll open another issue for implementing the fields extension

I didn't notice this before. The next big spec push is around the API spec, sometime in the next few months, so there's time to refine how that works if the currently described behavior isn't great. I'd prefer to apply whatever filters a client provides (Franklin's current behavior) instead of requiring consumers to understand the spec well enough to know that some filters won't be applied under certain conditions. What do you think?

I fully agree, I thought this was quite a weird specification tbh, counter-intuitive at least.

Originally I thought:

  • "collections": results have to be in one of those collections
  • "ids": results have to be in this list of ids

--> so composing the two criteria would have meant: results have to be in one of those collections and in this list of ids".

if users of the API wanted not to care about a certain criteria (e.g. "collections") they wouldnt specify it imo.
maybe this specification is to simplify STAC implementations?

Ok I think we're on the same page here -- I'll open an issue in the STAC API specification repo to clarify and ideally revert that choice

awesome! thanks James :)

btw we deployed the latest docker tag published and the query with ids works as expected in combination with collections filter, thanks a lot!

@jisantuc just as one example for fields -- when i indexed MODIS and had a well-decimated polygon in the proj:geometry field, it accounted for like 90% of the entire Item json, so excluding it by default (but allowing a user to get it if they really wanted) was useful