biothings / biothings_explorer

TRAPI service for BioThings Explorer

Home Page:https://explorer.biothings.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TRAPI 1.5: support source_record_urls

colleenXu opened this issue · comments

TRAPI 1.5 has updated documentation for "references", that says "urls to an external page for the association" should be put into the KG Edge's sources (in the source_record_urls field for one of the source objects), rather than the biolink:publications edge-attribute.

We've gotten confirmation from the UI team that they're aware of this and plan to add support for it (Translator Slack link).

My implementation idea: use a different keyword in the response-mapping

  • right now, we use ref_url to tell BTE to add the field's values to the biolink:publications edge-attribute (previous issue)
  • maybe we could use source_url to tell BTE to use the field's values in the sources
    • add to the primary-knowledge-source's object? ("resource_role": "primary_knowledge_source")
    • add a key:value pair, where key = source_record_urls and value is an array of strings (make 1-element arrays for single urls) from the mapped response field.

We'd do this for:

  • BioThings BindingDB (bindingdb webpages)
  • BioThings rare-source (rare-source webpages; do special reverses so we can retrieve urls in both directions?)
  • MyGene clingen operations (clingen webpages, do special reverses so we can retrieve urls in both directions?)

For development, use these yamls:

We'll add overrides to these yamls when we deploy this feature.


And WE ARE WAITING AND WON'T MERGE the SmartAPI yaml PRs until AFTER this feature is deployed to Prod:

After these SmartAPI yaml PRs are merged, overrides to this branch's yamls can be removed from BTE.

@colleenXu Do you have any example queries that would hit BindingDB and likely return source_url with the above override?

@tokebe Sorry for seeing this late >.<

Query for "protein-ligand" operation

Start with CD47 (aka UniProtKB:Q08722 / NCBIGene:961)

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": ["biolink:Gene"],
                    "ids": ["UniProtKB:Q08722"]
                },
                "n1": {
                    "categories": ["biolink:SmallMolecule"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:physically_interacts_with"]
                }
            }
        }
    }
}

Currently this is one of the edges:

                "627d6da60b47a3585f493321cb491e82": {
                    "predicate": "biolink:physically_interacts_with",
                    "subject": "NCBIGene:961",
                    "object": "PUBCHEM.COMPOUND:155537282",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:publications",
                            "value": [
                                "http://www.bindingdb.org/jsp/dbsearch/PrimarySearch_ki.jsp?energyterm=kJ/mole&tag=r21&monomerid=50530406&enzyme=Leukocyte+surface+antigen+CD47&column=ki&startPg=0&Increment=50&submit=Search",
                                "PMID:31403795",
                                "doi:10.1021/acs.jmedchem.9b00024"
                            ],
                            "value_type_id": "linkml:Uriorcurie"
                        }
                    ],
                    "sources": [
                        {
                            "resource_id": "infores:bindingdb",
                            "resource_role": "primary_knowledge_source"
                        },
                        {
                            "resource_id": "infores:biothings-bindingdb",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:bindingdb"
                            ]
                        },
                        {
                            "resource_id": "infores:service-provider-trapi",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:biothings-bindingdb"
                            ]
                        }
                    ]
                },

With the different handling for source_url, the long url should be in the primary-source-object instead

                "627d6da60b47a3585f493321cb491e82": {
                    "predicate": "biolink:physically_interacts_with",
                    "subject": "NCBIGene:961",
                    "object": "PUBCHEM.COMPOUND:155537282",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:publications",
                            "value": [
                                "PMID:31403795",
                                "doi:10.1021/acs.jmedchem.9b00024"
                            ],
                            "value_type_id": "linkml:Uriorcurie"
                        }
                    ],
                    "sources": [
                        {
                            "resource_id": "infores:bindingdb",
                            "resource_role": "primary_knowledge_source",
                            "source_record_urls":  [
                                "http://www.bindingdb.org/jsp/dbsearch/PrimarySearch_ki.jsp?energyterm=kJ/mole&tag=r21&monomerid=50530406&enzyme=Leukocyte+surface+antigen+CD47&column=ki&startPg=0&Increment=50&submit=Search"
                            ]
                        },
                        {
                            "resource_id": "infores:biothings-bindingdb",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:bindingdb"
                            ]
                        },
                        {
                            "resource_id": "infores:service-provider-trapi",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:biothings-bindingdb"
                            ]
                        }
                    ]
                },

Query for "ligand-protein" operation

Start with chemical INCHIKEY:NZLXOECMDRNZDN-GUYCJALGSA-N

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories": ["biolink:SmallMolecule"],
                    "ids": ["INCHIKEY:NZLXOECMDRNZDN-GUYCJALGSA-N"]
                },
                "n1": {
                    "categories": ["biolink:Gene"]
                }
            },
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:physically_interacts_with"]
                }
            }
        }
    }
}

Currently there's only 1 edge

            "edges": {
                "00c844be0dd7da974fcb364b5bc9c1e0": {
                    "predicate": "biolink:physically_interacts_with",
                    "subject": "PUBCHEM.COMPOUND:134553288",
                    "object": "NCBIGene:187",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:publications",
                            "value": [
                                "http://www.bindingdb.org/jsp/dbsearch/PrimarySearch_ki.jsp?energyterm=kJ/mole&tag=r21&monomerid=456871&enzyme=Apelin+receptor&column=ki&startPg=0&Increment=50&submit=Search"
                            ],
                            "value_type_id": "linkml:Uriorcurie"
                        }
                    ],
                    "sources": [
                        {
                            "resource_id": "infores:bindingdb",
                            "resource_role": "primary_knowledge_source"
                        },
                        {
                            "resource_id": "infores:biothings-bindingdb",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:bindingdb"
                            ]
                        },
                        {
                            "resource_id": "infores:service-provider-trapi",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:biothings-bindingdb"
                            ]
                        }
                    ]
                }
            }
        },

With the different handling for source_url, the long url should be in the primary-source-object instead - so there'll be no edge attributes

            "edges": {
                "00c844be0dd7da974fcb364b5bc9c1e0": {
                    "predicate": "biolink:physically_interacts_with",
                    "subject": "PUBCHEM.COMPOUND:134553288",
                    "object": "NCBIGene:187",
                    "attributes": [],
                    "sources": [
                        {
                            "resource_id": "infores:bindingdb",
                            "resource_role": "primary_knowledge_source",
                            "source_record_urls":  [
                                "http://www.bindingdb.org/jsp/dbsearch/PrimarySearch_ki.jsp?energyterm=kJ/mole&tag=r21&monomerid=456871&enzyme=Apelin+receptor&column=ki&startPg=0&Increment=50&submit=Search"
                            ]
                        },
                        {
                            "resource_id": "infores:biothings-bindingdb",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:bindingdb"
                            ]
                        },
                        {
                            "resource_id": "infores:service-provider-trapi",
                            "resource_role": "aggregator_knowledge_source",
                            "upstream_resource_ids": [
                                "infores:biothings-bindingdb"
                            ]
                        }
                    ]
                }
            }
        },

I've updated my comment above with all of the adjusted SmartAPI yamls for this feature. We'll add overrides to these yamls for this feature later.

I didn't adjust the MyGene reverse operation (geneToDisease) to retrieve the url, because I encountered an issue with jmespath (comment, issue). I didn't encounter this issue when adjusting the BioThings rare source operation - probably because the value of raresource.disease field is always an array (even if there's only 1 element).

I don't think this is a blocking issue for this feature though.

@tokebe

I've added a PR in bte-server to add the overrides needed biothings/bte-server#25

Also: could we remove the source_record_urls field when it has no value? Right now it's in every primary knowledge source object, when it'll only be filled in a few cases.

(based on a quick look)

Latest commit should fix this.

Note: we can keep the override to BioThings rare-source, but I reverted the branch's yaml so it's using ref_url (commit). AKA it's not using this feature anymore. These links provide related literature info, but don't explain where the original association came from (GARD). So they should be edge-attributes

We'd like to use source_url for BioThings PFOCR's pfocrUrl field (Ref Slack discussion with @AlexanderPico and NCATS-Tangerine/translator-api-registry#132 (comment)).

However, we needed to adjust our "unique edge hashing" so a TRAPI edge could have multiple values in the source_record_urls array. I think this is needed for our edges from BioThings PFOCR - since I don't think we want a separate edge for every subject/object/figure combo. Currently, we merge records so an edge can contain info from multiple figures with the same triple (subject/object entities).


We have two code changes that have been deployed to dev/CI -> and we want to patch to Test where the rest of the code for this feature is:

The BTE code + old overrides (having ncats_rare_source rather than pfocr) were deployed to Prod as part of the Octopus release. I tested and it's live.

However, the latest PFOCR stuff is only on dev/CI right now. I think we want to get this in as a patch to Test/Prod.


So I havne't merged the SmartAPI yaml PR yet, or added the override removal to the chore

See #811 (comment) for details on how we're going to remove the overrides/deploy the PFOCR stuff.

Related PRs deployed to Prod.