biothings / biothings_explorer

TRAPI service for BioThings Explorer

Home Page:https://api.bte.ncats.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

add `max research phase` to `treatsChembl` edges from mychem.info

andrewsu opened this issue · comments

Per a request from @mbrush, I'm creating this issue to add a max research phase attribute to edges based on treatsChembl and treatsChembl-rev in the mychem.info openAPI annotation file. This info will be used in this CQS query template. Note that the application of the attribute constraint will occur within the CQS, but we need to make sure that BTE returns the attribute in the response.

test query:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01",
          "predicates": [
            "biolink:in_clinical_trials_for"
          ]
        }
      },
      "nodes": {
        "n00": {
          "categories": [
            "biolink:ChemicalEntity"
          ]
        },
        "n01": {
          "categories": [
            "biolink:Disease"
          ],
          "ids": [
            "MONDO:0004979"
          ]
        }
      }
    }
  }
}

example edge returned from https://bte.ci.transltr.io/v1/query:

                "d29ff5f4006b1523fea5a64ef3b36292": {
                    "predicate": "biolink:in_clinical_trials_for",
                    "subject": "PUBCHEM.COMPOUND:1993",
                    "object": "MONDO:0004979",
                    "attributes": [
                        {
                            "attribute_type_id": "biolink:publications",
                            "value": [
                                "https://dailymed.nlm.nih.gov/dailymed/drugInfo.cfm?setid=e8983096-65e6-41f2-8b6f-bbf0a7227307",
                                "https://clinicaltrials.gov/search?id=%22NCT02584257%22",
                                "https://clinicaltrials.gov/search?id=%22NCT05292976%22",
                                "https://clinicaltrials.gov/search?id=%22NCT02097537%22",
                                "https://clinicaltrials.gov/search?id=%22NCT01907334%22",
                                "clinicaltrials:NCT02584257",
                                "clinicaltrials:NCT05292976",
                                "clinicaltrials:NCT02097537",
                                "clinicaltrials:NCT01907334",
                                "https://clinicaltrials.gov/ct2/results?id=%22NCT03505489%22",
                                "clinicaltrials:NCT03505489"
                            ],
                            "value_type_id": "linkml:Uriorcurie"
                        }
                    ],
                    "sources": [
                       ...
                    ]
                },

The fix to this issue would add a new entry under attributes for max research phase. As noted in https://github.com/NCATS-Tangerine/translator-api-registry/blob/biolink-4-update/mychem.info/openapi_full.yml#L1069, this info is available from mychem.info under chembl.drug_indications.max_phase_for_ind.

The changes should be live on Dev/CI within 10 min of the linked commit. BTE/Service Provider now create a "biolink:max_research_phase" edge-attribute from the MyChem chembl.drug_indications.max_phase_for_ind info.

However, the values returned right now don't match the biolink-model spec for "max research phase"...

  • We're returning arrays of strings, and I've seen these values: "-1.0", "0.0", "0.5", "1.0", "2.0", "3.0", "4.0"
  • VS the biolink-model max research phase has specific values (enum) that are more descriptive strings

If we needed to return values in the biolink-model spec, I imagine we'd need to figure out all current possible values and what they mean, map them to the biolink-model's values, and do a BTE JQ/post-processing step with those mappings.

https://mychem.info/v1/query?q=_exists_:chembl.drug_indications&fields=chembl.drug_indications&facets=chembl.drug_indications.max_phase_for_ind

{
    "took": 12,
    "total": 8462,
    "max_score": 1,
    "facets": {
        "chembl.drug_indications.max_phase_for_ind": {
            "_type": "terms",
            "terms": [
                {
                    "count": 4554,
                    "term": 2
                },
                {
                    "count": 3996,
                    "term": 1
                },
                {
                    "count": 3142,
                    "term": 3
                },
                {
                    "count": 2707,
                    "term": 4
                },
                {
                    "count": 692,
                    "term": 0
                },
                {
                    "count": 492,
                    "term": -1
                }
            ],
            "other": 0,
            "missing": 0,
            "total": 15583
        }
    },

Mapping:

  • "-1.0" -> "not_provided" (actually "clinical trial phase unknown")
  • "0.5" -> "pre_clinical_research_phase" (actually Phase 0 trials)
  • "1.0" -> "clinical_trial_phase_1"
  • "2.0" -> "clinical_trial_phase_2"
  • "3.0" -> "clinical_trial_phase_3"
  • "4.0" -> "clinical_trial_phase_4" (actually "approved"/"marketed")
  • not one of these values -> "not_provided" and raise an error that we'd catch in Sentry

References

@tokebe

I don't think "0.0" actually exists in the data, so let's remove that handling/mapping. I've edited my post above.

There's been discussions happening in Slack w/ Chunlei and Dylan. They discovered an issue with the API that they're addressing, but it won't affect the actual values that we're working with (lab Slack links).

@mbrush

In the above post, I wrote mappings between Chembl's "max phase for indication" values (specific to each drug-indication pair) and the biolink-model's MaxResearchPhaseEnum values.

However, I'm not sure on "-1.0", "0.5", and "4.0" - because the actual definition in Chembl doesn't seem to quite match the options.

I'm wondering if you have advice/opinions on this.

I've tested the linked PR biothings/bte_trapi_query_graph_handler#192 and it works as-intended!

Example query

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "ids":["MEDDRA:10012374"],
                    "categories":["biolink:Disease"]
                },
                "n1": {
                    "categories":["biolink:SmallMolecule"]
               }
            },
            "edges": {
                "e1": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:tested_by_clinical_trials_of"]
                }
            }
        }
    }
}

Here's the before-after for some edges from chembl-treats operations:

max phase for ind = -1.0

Before:

                "efeef17260b03e60a9266d6d860f2ad1": {
                    "predicate": "biolink:tested_by_clinical_trials_of",
                    "subject": "MONDO:0002050",
                    "object": "CHEBI:135061",
                    "attributes": [

                        {
                            "attribute_type_id": "biolink:max_research_phase",
                            "value": [
                                "-1.0"
                            ]
                        },

After:

                "efeef17260b03e60a9266d6d860f2ad1": {
                    "predicate": "biolink:tested_by_clinical_trials_of",
                    "subject": "MONDO:0002050",
                    "object": "CHEBI:135061",
                    "attributes": [

                        {
                            "attribute_type_id": "biolink:max_research_phase",
                            "value": [
                                "not_provided"
                            ]
                        },

max phase for ind = 0.5

Before:

                "5e9942a90c62660599c90a4e6700fe73": {
                    "predicate": "biolink:tested_by_clinical_trials_of",
                    "subject": "MONDO:0002050",
                    "object": "CHEBI:28939",
                    "attributes": [

                        {
                            "attribute_type_id": "biolink:max_research_phase",
                            "value": [
                                "0.5"
                            ]
                        },

After:

                "5e9942a90c62660599c90a4e6700fe73": {
                    "predicate": "biolink:tested_by_clinical_trials_of",
                    "subject": "MONDO:0002050",
                    "object": "CHEBI:28939",
                    "attributes": [

                        {
                            "attribute_type_id": "biolink:max_research_phase",
                            "value": [
                                "pre_clinical_research_phase"
                            ]
                        },

max phase for ind = 1.0

Before:

                "6082fe0e1b9e1681ce88b2cf94abf3e0": {
                    "predicate": "biolink:tested_by_clinical_trials_of",
                    "subject": "MONDO:0002050",
                    "object": "PUBCHEM.COMPOUND:24802842",
                    "attributes": [

                        {
                            "attribute_type_id": "biolink:max_research_phase",
                            "value": [
                                "1.0"
                            ]
                        },

After:

                "6082fe0e1b9e1681ce88b2cf94abf3e0": {
                    "predicate": "biolink:tested_by_clinical_trials_of",
                    "subject": "MONDO:0002050",
                    "object": "PUBCHEM.COMPOUND:24802842",
                    "attributes": [

                        {
                            "attribute_type_id": "biolink:max_research_phase",
                            "value": [
                                "clinical_trial_phase_1"
                            ]
                        },

max phase for ind = 2.0

Before:

                "9b2dcc4dad93c04c227d80147a766d76": {
                    "predicate": "biolink:tested_by_clinical_trials_of",
                    "subject": "MONDO:0002050",
                    "object": "PUBCHEM.COMPOUND:4118151",
                    "attributes": [

                        {
                            "attribute_type_id": "biolink:max_research_phase",
                            "value": [
                                "2.0"
                            ]
                        },

After:

                "9b2dcc4dad93c04c227d80147a766d76": {
                    "predicate": "biolink:tested_by_clinical_trials_of",
                    "subject": "MONDO:0002050",
                    "object": "PUBCHEM.COMPOUND:4118151",
                    "attributes": [

                        {
                            "attribute_type_id": "biolink:max_research_phase",
                            "value": [
                                "clinical_trial_phase_2"
                            ]
                        },

max phase for ind = 3.0 and 4.0 (disease meddra IDs treated as same entity)

Before:

                "87f29deccfa0f51de58d9ee3f1f98963": {
                    "predicate": "biolink:tested_by_clinical_trials_of",
                    "subject": "MONDO:0002050",
                    "object": "CHEBI:36791",
                    "attributes": [

                        {
                            "attribute_type_id": "biolink:max_research_phase",
                            "value": [
                                "4.0",
                                "3.0"
                            ]
                        },

After:

                "87f29deccfa0f51de58d9ee3f1f98963": {
                    "predicate": "biolink:tested_by_clinical_trials_of",
                    "subject": "MONDO:0002050",
                    "object": "CHEBI:36791",
                    "attributes": [

                        {
                            "attribute_type_id": "biolink:max_research_phase",
                            "value": [
                                "clinical_trial_phase_4",
                                "clinical_trial_phase_3"
                            ]
                        },