For Multiomics/Text-Mining APIs, avoid merging records that only differ by their `edge.sources` contents
colleenXu opened this issue · comments
In both #775 and #774 (comment) (see third bullet in "implementation notes" and the collapsed details), we noticed that BTE will merge records that have different TRAPI-edge-source content. For x-bte annotation, we uses the trapi_sources
response-mapping keyword (feature from #617) to tell BTE to ingest/handle this content.
This can lead to undesired behavior, like a TRAPI edge that has two primary knowledge sources - this should cause a TRAPI validation error. See those issues for examples.
A fix specific for Monarch API was implemented in biothings/api-respone-transform.js@3534b23, but it would be nice to implement a solution for all APIs that use trapi_sources
. Right now, these are only the BioThings APIs we make in collab with other Translator teams, Multiomics/Text-Mining.
Note: If it's possible, it may be nice to have a solution that could theoretically work with any API (non-BioThings, external), just in case we do post-processing to create TRAPI-source-content with other APIs in the future (similar to, but probably more involved, than what we did with Monarch API).
Fix implemented as part of biothings/api-respone-transform.js#63
oh ooops, I didn't realize that the solution for Monarch API actually may work for all APIs?
I guess my next step is finding a example to test, to confirm that it is working with another API...
The fix I implemented in the Monarch PR affects how Record hashes are calculated across the board (i.e. all records now calculate their hash using their source/provenance chain), so if I understand this issue correctly, the fix should cover it completely.
Jackson and I confirmed that biothings/api-respone-transform.js#63 addresses #775 in a more robust way.
So we can revert biothings/bte_trapi_query_graph_handler#178 as part of #784 (changing that issue to be about removing temporary things we no longer need).
How I tested this
First, in a local install of BTE, be in all main branches. Remove or comment out the line added in https://github.com/biothings/bte_trapi_query_graph_handler/pull/178/files. (remember to build if needed to incorporate the change)
Then send this query to Multiomics biggim-drug-response through BTE: http://localhost:3000/v1/smartapi/adf20dd6ff23dfe18e8e012bde686e31/query
Query
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:5291"],
"categories":["biolink:ChemicalEntity"],
"name": "imatinib"
},
"n1": {
"categories":["biolink:Gene"]
}
},
"edges": {
"e1": {
"subject": "n0",
"object": "n1",
"predicates": ["biolink:physically_interacts_with"]
}
}
}
}
}
The response will have an odd edge to the gene KIT (NCBIGene:3815), with two primary knowledge sources in the sources
section: TTD and drugcentral.
odd edge: bug
"b46ae093d111737f77f8c728368c0040": {
"predicate": "biolink:physically_interacts_with",
"subject": "PUBCHEM.COMPOUND:5291",
"object": "NCBIGene:3815",
"attributes": [
{
"attribute_type_id": "biolink:primary_knowledge_source",
"attributes": [
{
"attribute_type_id": "biolink:source_infores",
"value": "infores:drugcentral"
}
],
"value": "infores:drugcentral"
},
{
"attribute_type_id": "biolink:aggregator_knowledge_source",
"value": "infores:biothings-multiomics-biggim-drugresponse"
}
],
"sources": [
{
"resource_id": "infores:ttd",
"resource_role": "primary_knowledge_source"
},
{
"resource_id": "infores:biothings-multiomics-biggim-drugresponse",
"resource_role": "aggregator_knowledge_source",
"upstream_resource_ids": [
"infores:ttd",
"infores:drugcentral"
]
},
{
"resource_id": "infores:service-provider-trapi",
"resource_role": "aggregator_knowledge_source",
"upstream_resource_ids": [
"infores:biothings-multiomics-biggim-drugresponse"
]
},
{
"resource_id": "infores:drugcentral",
"resource_role": "primary_knowledge_source"
}
]
},
Then check out the branch for biothings/api-respone-transform.js#63 (remember to build if needed to incorporate the change). Then run the same query again.
The response will change to two separate edges, 1 for TTD and 1 for drugcentral
"c0dd640042751eec03bf660be88d6075": {
"predicate": "biolink:physically_interacts_with",
"subject": "PUBCHEM.COMPOUND:5291",
"object": "NCBIGene:3815",
"attributes": [
{
"attribute_type_id": "biolink:primary_knowledge_source",
"attributes": [
{
"attribute_type_id": "biolink:source_infores",
"value": "infores:ttd"
}
],
"value": "infores:ttd"
},
{
"attribute_type_id": "biolink:aggregator_knowledge_source",
"value": "infores:biothings-multiomics-biggim-drugresponse"
}
],
"sources": [
{
"resource_id": "infores:ttd",
"resource_role": "primary_knowledge_source"
},
{
"resource_id": "infores:biothings-multiomics-biggim-drugresponse",
"resource_role": "aggregator_knowledge_source",
"upstream_resource_ids": [
"infores:ttd"
]
},
{
"resource_id": "infores:service-provider-trapi",
"resource_role": "aggregator_knowledge_source",
"upstream_resource_ids": [
"infores:biothings-multiomics-biggim-drugresponse"
]
}
]
},
"070cd562c2640dc1f5cd8e4f2050fadf": {
"predicate": "biolink:physically_interacts_with",
"subject": "PUBCHEM.COMPOUND:5291",
"object": "NCBIGene:3815",
"attributes": [
{
"attribute_type_id": "biolink:primary_knowledge_source",
"attributes": [
{
"attribute_type_id": "biolink:source_infores",
"value": "infores:drugcentral"
}
],
"value": "infores:drugcentral"
},
{
"attribute_type_id": "biolink:aggregator_knowledge_source",
"value": "infores:biothings-multiomics-biggim-drugresponse"
}
],
"sources": [
{
"resource_id": "infores:drugcentral",
"resource_role": "primary_knowledge_source"
},
{
"resource_id": "infores:biothings-multiomics-biggim-drugresponse",
"resource_role": "aggregator_knowledge_source",
"upstream_resource_ids": [
"infores:drugcentral"
]
},
{
"resource_id": "infores:service-provider-trapi",
"resource_role": "aggregator_knowledge_source",
"upstream_resource_ids": [
"infores:biothings-multiomics-biggim-drugresponse"
]
}
]
},
Also, I think this will only be an issue for Multiomics biggim-drug-response, because that KP uses multiple sources.
VS the other APIs are basically single source:
- multiomics ehr risk
- multiomics wellness
- multiomics clinicaltrials
- text-mining targeted
@colleenXu Can this issue be closed? Relevant BTE code is now on Prod.
Confirmed that it's fixed now: there's two edges when posting the example query to https://bte.transltr.io/v1/smartapi/adf20dd6ff23dfe18e8e012bde686e31/query
(Prod instance).