bug: BTE misparsing Automat responses
colleenXu opened this issue · comments
While working on #771, I noticed results where the Automat edges seemed almost identical.
After digging into this, I suspect that some of these edges are erroneously created by BTE and don't actually exist.
Example 1: imatinib-KIT from Automat Pharos
If we query BTE for edges between imatinib (PUBCHEM.COMPOUND:5291) and KIT (NCBIGene:3815), we get 5 edges from Automat Pharos imatinib-KIT.json:
- 3 have the same pKd of 7.9 and 4 PMIDs in their edge attributes:
ef352ea5b97b591391cdc453c24ef097
: related_todbac89955b6e2a343ad9a93ae4918d26
: physically_interacts_withb646f32eafa1a151eea892b5a0cf7765
: binds
- 2 have the same qualifier set (causes decreased activity), pKd of 7.89, and 2 PMIDs:
780d2bfeac965695fb923f35ff1c0bb2
: related_to000974f0ffe73e3a28b3f6297d5bd3c3
: affects
VS when I query Automat Pharos directly for edges between imatinib (PUBCHEM.COMPOUND:5291) and KIT (NCBIGene:3815), I get 2 edges automat-pharos-1.json:
- binds (pKd 7.9 + 4 PMIDs)
- affects (with the qualifier set, pKd 7.89, 2 PMIDs)
Example 2: imatinib-PDGFRA from Automat DrugCentral
If we query BTE for edges between imatinib (PUBCHEM.COMPOUND:5291) and PDGFRA (NCBIGene:5156; as a Gene and a Protein), we get a similar situation to Example 1: 5 edges from Automat DrugCentral imatinib-PDGFRA.json...
- 3 have the same Kd of 7.51 in their edge attributes:
f6f4696df6e39957f4cc8a51a1f4e737
: related_to643a722004d529d83ecab7dc2df7031d
: physically_interacts_with547c79b19227f6637110140877059ed3
: binds
- 2 have the same qualifier set (causes decreased activity) and IC50 of 7.3 in the edge-attributes:
e42d078b7fa828c5ea7e531556289833
: related_to1cec3bbf58fe9ca53a48016975785eaa
: affects
VS when I query Automat DrugCentral directly for edges between imatinib (PUBCHEM.COMPOUND:5291) and PDGFRA (UniProtKB:P16234, as a Protein), I get 2 edges automat-drugcentral-1.json:
- binds (Kd of 7.51)
- affects (with the qualifier set, IC50 of 7.3)
Note: this example also has 2 almost-identical Automat Pharos edges (related_to and affects, both with the same qualifier set, pIC50 of 8.7, and 3 PMIDs). The affects edge is likely the real one.
I think this is what is happening:
- When BTE queries Automat KPs following a MetaEdge's info, Automat KPs will return edges where the predicate is a descendant (not an exact match)
- But BTE will then assign the record's predicate using the MetaEdge's info - creating an Edge with the wrong predicate (doesn't actually exist in the Automat resource)
- So a possible solution could be to drop a record for a TRAPI KP when the predicate/qualifier-set of the record doesn't match that of the MetaEdge?
However, there are some oddities. BTE doesn't seem to be creating extra edges...
- in every case of an Automat KP edge. So there may be some inconsistency or difference in meta_knowledge_graphs between Automat KPs.
- For example, in both of the cases above, there's an Automat Hetio edge that doesn't show this "duplicating" behavior
- for qualifier-sets. It's good that BTE isn't doing this! But I'm a bit suspicious that this could happen and just isn't captured by these examples.
- Ex: there's edges with the qualifier-set "causes decreased activity" but no edges for subsets of that qualifier set ("causes", "causes activity", "decreased activity", etc).
And a note: while I think there are real, separate edges in Automat pharos vs hetio vs drugcentral...these edges are very similar - as if they come from the same underlying sources (chembl?). Dunno if we want to bring that up with the Automat team...
Considering this high priority.
I've confirmed that things work as-expected after the Prod deployment. Closing issue
Follow-up for example 1
POST to https://bte.transltr.io/v1/query
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:5291"],
"categories":["biolink:ChemicalEntity"],
"name": "imatinib"
},
"n1": {
"ids":["NCBIGene:3815"],
"categories":["biolink:Gene"],
"name": "KIT"
}
},
"edges": {
"e1": {
"subject": "n0",
"object": "n1"
}
}
}
}
}
I now get only the two Automat-pharos edges that are in the original API automat-pharos-fixed.json:
- b646f32eafa1a151eea892b5a0cf7765: binds (pKd of 7.9 and 4 PMIDs)
- 000974f0ffe73e3a28b3f6297d5bd3c3: affects (qualifier set (causes decreased activity), pKd of 7.89, and 2 PMIDs)
Follow-up for example 2
POST to https://bte.transltr.io/v1/query
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:5291"],
"categories":["biolink:ChemicalEntity"],
"name": "imatinib"
},
"n1": {
"ids":["NCBIGene:5156"],
"categories":["biolink:Gene", "biolink:Protein"],
"name": "PDGFRA"
}
},
"edges": {
"e1": {
"subject": "n0",
"object": "n1"
}
}
}
}
}
I now get only the two Automat-drugcentral edges that are in the original API automat-drugcentral-fixed.json:
- 547c79b19227f6637110140877059ed3: binds (Kd of 7.51)
- 1cec3bbf58fe9ca53a48016975785eaa: affects (qualifier set (causes decreased activity) and IC50 of 7.3)