biothings / biothings_explorer

TRAPI service for BioThings Explorer

Home Page:https://api.bte.ncats.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bug: BTE misparsing Automat responses

colleenXu opened this issue · comments

While working on #771, I noticed results where the Automat edges seemed almost identical.

After digging into this, I suspect that some of these edges are erroneously created by BTE and don't actually exist.

Example 1: imatinib-KIT from Automat Pharos

If we query BTE for edges between imatinib (PUBCHEM.COMPOUND:5291) and KIT (NCBIGene:3815), we get 5 edges from Automat Pharos imatinib-KIT.json:

  • 3 have the same pKd of 7.9 and 4 PMIDs in their edge attributes:
    • ef352ea5b97b591391cdc453c24ef097: related_to
    • dbac89955b6e2a343ad9a93ae4918d26: physically_interacts_with
    • b646f32eafa1a151eea892b5a0cf7765: binds
  • 2 have the same qualifier set (causes decreased activity), pKd of 7.89, and 2 PMIDs:
    • 780d2bfeac965695fb923f35ff1c0bb2: related_to
    • 000974f0ffe73e3a28b3f6297d5bd3c3: affects

VS when I query Automat Pharos directly for edges between imatinib (PUBCHEM.COMPOUND:5291) and KIT (NCBIGene:3815), I get 2 edges automat-pharos-1.json:

  • binds (pKd 7.9 + 4 PMIDs)
  • affects (with the qualifier set, pKd 7.89, 2 PMIDs)

Example 2: imatinib-PDGFRA from Automat DrugCentral

If we query BTE for edges between imatinib (PUBCHEM.COMPOUND:5291) and PDGFRA (NCBIGene:5156; as a Gene and a Protein), we get a similar situation to Example 1: 5 edges from Automat DrugCentral imatinib-PDGFRA.json...

  • 3 have the same Kd of 7.51 in their edge attributes:
    • f6f4696df6e39957f4cc8a51a1f4e737: related_to
    • 643a722004d529d83ecab7dc2df7031d: physically_interacts_with
    • 547c79b19227f6637110140877059ed3: binds
  • 2 have the same qualifier set (causes decreased activity) and IC50 of 7.3 in the edge-attributes:
    • e42d078b7fa828c5ea7e531556289833: related_to
    • 1cec3bbf58fe9ca53a48016975785eaa: affects

VS when I query Automat DrugCentral directly for edges between imatinib (PUBCHEM.COMPOUND:5291) and PDGFRA (UniProtKB:P16234, as a Protein), I get 2 edges automat-drugcentral-1.json:

  • binds (Kd of 7.51)
  • affects (with the qualifier set, IC50 of 7.3)

Note: this example also has 2 almost-identical Automat Pharos edges (related_to and affects, both with the same qualifier set, pIC50 of 8.7, and 3 PMIDs). The affects edge is likely the real one.


I think this is what is happening:

  • When BTE queries Automat KPs following a MetaEdge's info, Automat KPs will return edges where the predicate is a descendant (not an exact match)
  • But BTE will then assign the record's predicate using the MetaEdge's info - creating an Edge with the wrong predicate (doesn't actually exist in the Automat resource)
  • So a possible solution could be to drop a record for a TRAPI KP when the predicate/qualifier-set of the record doesn't match that of the MetaEdge?

However, there are some oddities. BTE doesn't seem to be creating extra edges...

  • in every case of an Automat KP edge. So there may be some inconsistency or difference in meta_knowledge_graphs between Automat KPs.
    • For example, in both of the cases above, there's an Automat Hetio edge that doesn't show this "duplicating" behavior
  • for qualifier-sets. It's good that BTE isn't doing this! But I'm a bit suspicious that this could happen and just isn't captured by these examples.
    • Ex: there's edges with the qualifier-set "causes decreased activity" but no edges for subsets of that qualifier set ("causes", "causes activity", "decreased activity", etc).

And a note: while I think there are real, separate edges in Automat pharos vs hetio vs drugcentral...these edges are very similar - as if they come from the same underlying sources (chembl?). Dunno if we want to bring that up with the Automat team...

@tokebe @andrewsu I imagine that we'd want to get this bug addressed before this upcoming release...

Considering this high priority.

I've confirmed that things work as-expected after the Prod deployment. Closing issue

Follow-up for example 1

POST to https://bte.transltr.io/v1/query

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["PUBCHEM.COMPOUND:5291"],
                    "categories":["biolink:ChemicalEntity"],
                    "name": "imatinib"
                },
                "n1": {
                    "ids":["NCBIGene:3815"],
                    "categories":["biolink:Gene"],
                    "name": "KIT"
               }
            },
            "edges": {
                "e1": {
                    "subject": "n0",
                    "object": "n1"
                }
            }
        }
    }
}

I now get only the two Automat-pharos edges that are in the original API automat-pharos-fixed.json:

  • b646f32eafa1a151eea892b5a0cf7765: binds (pKd of 7.9 and 4 PMIDs)
  • 000974f0ffe73e3a28b3f6297d5bd3c3: affects (qualifier set (causes decreased activity), pKd of 7.89, and 2 PMIDs)

Follow-up for example 2

POST to https://bte.transltr.io/v1/query

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["PUBCHEM.COMPOUND:5291"],
                    "categories":["biolink:ChemicalEntity"],
                    "name": "imatinib"
                },
                "n1": {
                    "ids":["NCBIGene:5156"],
                    "categories":["biolink:Gene", "biolink:Protein"],
                    "name": "PDGFRA"
               }
            },
            "edges": {
                "e1": {
                    "subject": "n0",
                    "object": "n1"
                }
            }
        }
    }
}

I now get only the two Automat-drugcentral edges that are in the original API automat-drugcentral-fixed.json:

  • 547c79b19227f6637110140877059ed3: binds (Kd of 7.51)
  • 1cec3bbf58fe9ca53a48016975785eaa: affects (qualifier set (causes decreased activity) and IC50 of 7.3)