biothings / biothings_explorer

TRAPI service for BioThings Explorer

Home Page:https://api.bte.ncats.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

For Multiomics/Text-Mining APIs, avoid merging records that only differ by their `edge.sources` contents

colleenXu opened this issue · comments

In both #775 and #774 (comment) (see third bullet in "implementation notes" and the collapsed details), we noticed that BTE will merge records that have different TRAPI-edge-source content. For x-bte annotation, we uses the trapi_sources response-mapping keyword (feature from #617) to tell BTE to ingest/handle this content.

This can lead to undesired behavior, like a TRAPI edge that has two primary knowledge sources - this should cause a TRAPI validation error. See those issues for examples.

A fix specific for Monarch API was implemented in biothings/api-respone-transform.js@3534b23, but it would be nice to implement a solution for all APIs that use trapi_sources. Right now, these are only the BioThings APIs we make in collab with other Translator teams, Multiomics/Text-Mining.

Note: If it's possible, it may be nice to have a solution that could theoretically work with any API (non-BioThings, external), just in case we do post-processing to create TRAPI-source-content with other APIs in the future (similar to, but probably more involved, than what we did with Monarch API).

oh ooops, I didn't realize that the solution for Monarch API actually may work for all APIs?

I guess my next step is finding a example to test, to confirm that it is working with another API...

The fix I implemented in the Monarch PR affects how Record hashes are calculated across the board (i.e. all records now calculate their hash using their source/provenance chain), so if I understand this issue correctly, the fix should cover it completely.

Jackson and I confirmed that biothings/api-respone-transform.js#63 addresses #775 in a more robust way.

So we can revert biothings/bte_trapi_query_graph_handler#178 as part of #784 (changing that issue to be about removing temporary things we no longer need).


How I tested this

First, in a local install of BTE, be in all main branches. Remove or comment out the line added in https://github.com/biothings/bte_trapi_query_graph_handler/pull/178/files. (remember to build if needed to incorporate the change)

Then send this query to Multiomics biggim-drug-response through BTE: http://localhost:3000/v1/smartapi/adf20dd6ff23dfe18e8e012bde686e31/query

Query

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["PUBCHEM.COMPOUND:5291"],
                    "categories":["biolink:ChemicalEntity"],
                    "name": "imatinib"
                },
                "n1": {
                    "categories":["biolink:Gene"]
               }
            },
            "edges": {
                "e1": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:physically_interacts_with"]
                }
            }
        }
    }
}

The response will have an odd edge to the gene KIT (NCBIGene:3815), with two primary knowledge sources in the sources section: TTD and drugcentral.

odd edge: bug

                "b46ae093d111737f77f8c728368c0040": {
                    "predicate": "biolink:physically_interacts_with",
                    "subject": "PUBCHEM.COMPOUND:5291",
                    "object": "NCBIGene:3815",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:primary_knowledge_source",
                            "attributes": [
                                {
                                    "attribute_type_id": "biolink:source_infores",
                                    "value": "infores:drugcentral"
                                }
                            ],
                            "value": "infores:drugcentral"
                        },
                        {
                            "attribute_type_id": "biolink:aggregator_knowledge_source",
                            "value": "infores:biothings-multiomics-biggim-drugresponse"
                        }
                    ],
                    "sources": [
                        {
                            "resource_id": "infores:ttd",
                            "resource_role": "primary_knowledge_source"
                        },
                        {
                            "resource_id": "infores:biothings-multiomics-biggim-drugresponse",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:ttd",
                                "infores:drugcentral"
                            ]
                        },
                        {
                            "resource_id": "infores:service-provider-trapi",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:biothings-multiomics-biggim-drugresponse"
                            ]
                        },
                        {
                            "resource_id": "infores:drugcentral",
                            "resource_role": "primary_knowledge_source"
                        }
                    ]
                },

Then check out the branch for biothings/api-respone-transform.js#63 (remember to build if needed to incorporate the change). Then run the same query again.

The response will change to two separate edges, 1 for TTD and 1 for drugcentral

                "c0dd640042751eec03bf660be88d6075": {
                    "predicate": "biolink:physically_interacts_with",
                    "subject": "PUBCHEM.COMPOUND:5291",
                    "object": "NCBIGene:3815",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:primary_knowledge_source",
                            "attributes": [
                                {
                                    "attribute_type_id": "biolink:source_infores",
                                    "value": "infores:ttd"
                                }
                            ],
                            "value": "infores:ttd"
                        },
                        {
                            "attribute_type_id": "biolink:aggregator_knowledge_source",
                            "value": "infores:biothings-multiomics-biggim-drugresponse"
                        }
                    ],
                    "sources": [
                        {
                            "resource_id": "infores:ttd",
                            "resource_role": "primary_knowledge_source"
                        },
                        {
                            "resource_id": "infores:biothings-multiomics-biggim-drugresponse",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:ttd"
                            ]
                        },
                        {
                            "resource_id": "infores:service-provider-trapi",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:biothings-multiomics-biggim-drugresponse"
                            ]
                        }
                    ]
                },

                "070cd562c2640dc1f5cd8e4f2050fadf": {
                    "predicate": "biolink:physically_interacts_with",
                    "subject": "PUBCHEM.COMPOUND:5291",
                    "object": "NCBIGene:3815",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:primary_knowledge_source",
                            "attributes": [
                                {
                                    "attribute_type_id": "biolink:source_infores",
                                    "value": "infores:drugcentral"
                                }
                            ],
                            "value": "infores:drugcentral"
                        },
                        {
                            "attribute_type_id": "biolink:aggregator_knowledge_source",
                            "value": "infores:biothings-multiomics-biggim-drugresponse"
                        }
                    ],
                    "sources": [
                        {
                            "resource_id": "infores:drugcentral",
                            "resource_role": "primary_knowledge_source"
                        },
                        {
                            "resource_id": "infores:biothings-multiomics-biggim-drugresponse",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:drugcentral"
                            ]
                        },
                        {
                            "resource_id": "infores:service-provider-trapi",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:biothings-multiomics-biggim-drugresponse"
                            ]
                        }
                    ]
                },

Also, I think this will only be an issue for Multiomics biggim-drug-response, because that KP uses multiple sources.

VS the other APIs are basically single source:

  • multiomics ehr risk
  • multiomics wellness
  • multiomics clinicaltrials
  • text-mining targeted

@colleenXu Can this issue be closed? Relevant BTE code is now on Prod.

Confirmed that it's fixed now: there's two edges when posting the example query to https://bte.transltr.io/v1/smartapi/adf20dd6ff23dfe18e8e012bde686e31/query (Prod instance).