biothings / biothings_explorer

TRAPI service for BioThings Explorer

Home Page:https://api.bte.ncats.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

KP is returning a curie that wasn't asked for without a query id

maximusunc opened this issue · comments

I can send this curl:

curl -X POST https://api.bte.ncats.io/v1/team/Service%20Provider/query \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "message": {
    "query_graph": {
      "nodes": {
        "e": {
          "ids": [
            "CHEBI:8382",
            "PUBCHEM.COMPOUND:22420",
            "PUBCHEM.COMPOUND:10150081",
            "CHEBI:4911",
            "CHEBI:74947",
            "UNII:4F4X42SYQ6",
            "CHEBI:28748",
            "CHEBI:63635",
            "UNII:T4H8FMA7IM",
            "CHEBI:9168",
            "CHEBI:52717",
            "CHEBI:10110",
            "CHEBI:28445",
            "CHEBI:72690",
            "PUBCHEM.COMPOUND:135565884"
          ],
          "categories": ["biolink:ChemicalEntity"],
          "set_interpretation": "BATCH",
          "constraints": []
        },
        "i": {
          "ids": null,
          "categories": ["biolink:BiologicalEntity"],
          "set_interpretation": "BATCH",
          "constraints": []
        }
      },
      "edges": {
        "edge_2": {
          "subject": "i",
          "object": "e",
          "knowledge_type": null,
          "predicates": ["biolink:affects"],
          "attribute_constraints": [],
          "qualifier_constraints": [
            {
              "qualifier_set": [
                {
                  "qualifier_type_id": "biolink:object_aspect_qualifier",
                  "qualifier_value": "activity_or_abundance"
                },
                {
                  "qualifier_type_id": "biolink:object_direction_qualifier",
                  "qualifier_value": "decreased"
                }
              ]
            }
          ]
        }
      }
    }
  }
}' > service_provider_response.json

And one of the results has CHEBI:4027 as a result for e without a query id, even though it was not asked for.

CHEBI:4027 is obtained from the node normalizer as a primary equivalent of PUBCHEM.COMPOUND:22420, with options conflate:true and drug_chemical_conflate:true, so while it was not explicitly asked for, it was covered by what was explicitly asked for.

Just a quick note that this equivalency is specifically from the dev instance of nodenorm: https://nodenormalization-sri.renci.org/get_normalized_nodes?curie=PUBCHEM.COMPOUND:22420&drug_chemical_conflate=true -- doesn't appear to be found in ci / test / prod nodenorm, so I wouldn't expect you'd see that result from the other instances of BTE...

I've confirmed that this equivalency is not seen on BTE Prod.

Found relevant info (Translator Slack link): NodeNorm Dev was updated 4/19 to use an updated "preferred prefix order", so CHEBI is now preferred over PUBCHEM.COMPOUND. This explains the different primary ID for Dev vs other instances.

@maximusunc Is this sufficient explanation that this issue may be closed?

No, I think that even if you're giving the primary equivalent back, it's still not giving back what was asked and PUBCHEM.COMPOUND:22420 should be listed as the query_id.

Updating from internal discussion, example node binding:

{ "id": "CHEBI:4027", "query_id": "PUBCHEM.COMPOUND:22420", "attributes": [] }

This can be implemented as a post-pass on node bindings after results assembly.

@maximusunc In the interest of prioritization, can you say how critically this issue affects your team?

This issue has pointed out that we need to do some extra error checking on our end to make sure all results are expected. We're currently throwing out the entire response, but I think the plan is to just remove any specific results that can't be linked back to the query graph. We're also going to add the drug_conflation flag to our node norm calls. So this isn't super high priority from our end.

@maximusunc

To double-check, is this a problem Aragorn is encountering? Or some other tool?
And NodeNorm is updating to CI later today....so maybe these problems will start happening on CI afterwards?

I'm wondering if there's a way for your tool to use/send the same primary IDs that our tool is using (we're using the NodeNorm instance corresponding to our tool's instance + setting conflate to true for gene/protein and drug/chem). (Note that we made a change a few months ago to set drug_chemical_conflate to true, after asking Chris Bizon in Slack about it (issue, Translator Slack link).)

Of course, you make a valid point that query_id should be used. I wasn't aware of that query_id was/should be used this way (oops >.<).

Aragorn is encountering this, and we're also using the node norm corresponding to the same env, but we're missing the drug/chem conflation flag. We're going to be adding that flag in soon so that our primary IDs will match, so this specific issue will be resolved, but I think it's also good that we uncovered the query_id issue as well.