creative-mode "Explain-style" prototype for Pathfinder option (Translator Jan 2024 Relay)
colleenXu opened this issue · comments
For the Jan 2024 Relay, Translator teams are supposed to bring prototypes for the next-creative-mode choices: Pathfinder and multi-curie. Our team has chosen to focus on Pathfinder.
This issue is for discussing / working on this prototype.
Current assumptions:
- only 1 ID on each QNode, aka "finding paths between only two entities"
- We can implement this by having BTE run "Explain queries", first 1 QEdge (are they directly connected), then 2 QEdges (1 intermediate node), etc...
- supposed to be "on Dev" by Relay, but we may be fine if we don't get this done: bring a set of template queries, plans on how we would implement it and how difficult/easy it is
- if need be, we could narrow scope (categories for starting nodes or intermediate nodes, predicates?)
Running Explain-type queries for the imatinib use cases
Summary:
- It takes > 5 min to run the Explain query with 1 intermediate...
- BTE has the expected gene results (highly-ranked too)
- but lacks the explanatory intermediates like "mast cells", "immune cell activation", "cell cycle". I think this comes from (1) not much association data for
biolink:Cell
(cell types) and (2) thebiolink:BiologicalProcessOrActivity
association data (includes MolecularActivity, PhysiologicalProcess, PathologicalProcess, Pathway) mostly connects to Genes. Very little connects to other categories like the Disease / Chemical in these use cases.
imatinib ➡️ asthma
Should go through c-kit, mast cells, and immune cell activation.
I used these IDs using SRI Name Resolver: PUBCHEM.COMPOUND:5291
for imatinib, MONDO:0004979
for asthma.
There are direct edges for imatinib ➡️ asthma
First, I ran an Explain-query w/o any intermediates (1 QEdge connecting them):
- it runs quickly, only 14 s
- 4 Edges found:
- treats: text-mining targeted
- associated_with: multiomics ehr risk. with qualifiers, the statement is "imatinib is associated with decreased likelihood of asthma"
- has_adverse_event: from automat drugcentral (faers) and mychem drugcentral (likely the same original data)
full response: imatinib-direct-asthma.json
click to see query-graph
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:5291"],
"name": "imatinib"
},
"n1": {
"ids":["MONDO:0004979"],
"name": "asthma"
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1"
}
}
}
}
}
imatinib ➡️ 1 intermediate ⬅️ asthma
Then, I ran an Explain-query w/ 1 intermediate QNode. full response here: imatinib-inter-asthma-4.json
click to see query-graph
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:5291"],
"name": "imatinib"
},
"n1": {
"categories":["biolink:NamedThing"]
},
"n2": {
"ids":["MONDO:0004979"],
"categories":["biolink:DiseaseOrPhenotypicFeature"],
"name": "asthma"
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1",
"predicates": [
"biolink:related_to_at_instance_level",
"biolink:disease_has_location", "biolink:location_of_disease",
"biolink:composed_primarily_of", "biolink:primarily_composed_of",
"biolink:has_chemical_role",
"biolink:has_member", "biolink:member_of"
]
},
"e1": {
"subject": "n2",
"object": "n1",
"predicates": [
"biolink:related_to_at_instance_level",
"biolink:disease_has_location", "biolink:location_of_disease",
"biolink:composed_primarily_of", "biolink:primarily_composed_of",
"biolink:has_chemical_role",
"biolink:has_member", "biolink:member_of"
]
}
}
}
}
}
Overall
- it took 7 min to run locally (w/o threading or caching)
- 718 results
- Big Caveat: I also set the QNode with the asthma ID to the category
DiseaseOrPhenotypicFeature
. If I removed this category so BTE solely uses the top category from SRI NodeNorm, then KIT isn't in the results (see this response: imatinib-inter-asthma-3.json).- I suspect that the asthma - KIT edge from biolink-api/monarch is retrieved by the PhenotypicFeature (HP ID) -> Gene operations, not the Disease (MONDO ID) -> Gene operations.
- Note: I set a predicate list for both QEdges to exclude ones that didn't seem useful or may create unhelpful self-edges: superclass_of, subclass_of, broad_match / narrow_match, close_match / exact_match / same_as
Expected results
- KIT (Gene) is the top result! Note: only 1 edge for asthma ➡️ KIT from biolink-api (monarch) VS lots of imatinib ➡️ KIT edges from multiple sources
- "mast cells": no exact match. There were some intermediate diseases that seemed related:
- systemic mastocytosis - rank 21
- mastocytosis - rank 34
- aggressive systemic mastocytosis - rank 674
- "immune cell activation": no exact match (I was looking for a related process / activity / pathway). But there are disease and gene intermediates that seem to be related to the immune system and to cell proliferation.
- Genes PDGFRA (rank 2) and PDGFB (rank 55) seem related to asthma, immune regulation, and imatinib
- immune system disorder intermediates like idiopathic hypereosinophilic syndrome (rank 33), eosinophilic pneumonia (rank 138)
intermediate node categories analysis
I searched the response using the console logs for the intermediate node categories, and got this list:
- Disease, PhenotypicFeature
- Gene
- Chem: SmallMolecule, ChemicalEntity, MolecularMixture
- 1 PhysiologicalProcess, pregnancy (imatinib contraindicated_for pregnancy and asthma correlated_with pregnancy)
- 1 Procedure, liver transplantation (edges from multiomics ehr risk)
Interestingly, Pathway didn't show up at all - BiologicalProcess, MolecularActivity, PhysiologicalProcess, PathologicalProcess all did, before the intersecting of intermediate nodes.
Console logs for intermediate node categories
After getting the intersection of intermediate nodes, the final console log for the categories was:
bte:biothings-explorer-trapi:QEdge Collected entity ids in records:
["Disease","PhenotypicFeature","PhysiologicalProcess","BiologicalProcess",
"SmallMolecule","ChemicalExposure","Drug","Gene",
"ChemicalEntity","Procedure","MolecularMixture"] +76ms
But before then, during the imatinib hop, the categories were:
bte:biothings-explorer-trapi:QEdge Collected entity ids in records:
["SmallMolecule","PhysiologicalProcess","Disease","PhenotypicFeature",
"Gene","Protein","PathologicalProcess","Procedure",
"ChemicalEntity","OrganismTaxon","Polypeptide","Cell",
"Phenomenon","Drug","MolecularActivity","DiseaseOrPhenotypicFeature",
"CellularComponent","GrossAnatomicalStructure","AnatomicalEntity","Plant",
"MolecularMixture","ComplexMolecularMixture"] +58ms
And during the asthma hop (before the intersecting began), the categories were:
bte:biothings-explorer-trapi:QEdge Collected entity ids in records:
["Disease","PhenotypicFeature","Device","ClinicalIntervention",
"Procedure","Event","ClinicalAttribute","PhysiologicalProcess",
"BiologicalProcess","Activity","ComplexMolecularMixture","Gene",
"Protein","ChemicalExposure","SmallMolecule","Drug",
"Publication","InformationContentEntity","PopulationOfIndividualOrganisms","EnvironmentalExposure",
"MolecularMixture","ChemicalEntity","SequenceVariant","OrganismAttribute"] +365ms
imatinib ➡️ CML (Chronic myelogenous leukemia)
Should go through BCR-ABL and cell cycle
I used these IDs using SRI Name Resolver: PUBCHEM.COMPOUND:5291
for imatinib, MONDO:0011996
for CML.
There are direct edges for imatinib ➡️ CML and its descendants
First, I ran an Explain-query w/o any intermediates (1 QEdge connecting them):
- it runs quickly, only 13 s
- lots of direct edges, including "treats"
- also some edges to descendants
full response: imatinib-direct-cml.json
click to see query-graph
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:5291"],
"name": "imatinib"
},
"n1": {
"ids":["MONDO:0011996"],
"name": "cml"
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1"
}
}
}
}
}
imatinib ➡️ 1 intermediate ⬅️ CML
Then, I ran an Explain-query w/ 1 intermediate QNode. full response here: imatinib-inter-cml-2.json
click to see query-graph
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:5291"],
"name": "imatinib"
},
"n1": {
"categories":["biolink:NamedThing"]
},
"n2": {
"ids":["MONDO:0011996"],
"name": "cml"
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1",
"predicates": [
"biolink:related_to_at_instance_level",
"biolink:disease_has_location", "biolink:location_of_disease",
"biolink:composed_primarily_of", "biolink:primarily_composed_of",
"biolink:has_chemical_role",
"biolink:has_member", "biolink:member_of"
]
},
"e1": {
"subject": "n2",
"object": "n1",
"predicates": [
"biolink:related_to_at_instance_level",
"biolink:disease_has_location", "biolink:location_of_disease",
"biolink:composed_primarily_of", "biolink:primarily_composed_of",
"biolink:has_chemical_role",
"biolink:has_member", "biolink:member_of"
]
}
}
}
}
}
Overall
- it took 6 min 46 s to run locally (w/o threading or caching)
- 1661 results
- Note: I set a predicate list for both QEdges to exclude ones that didn't seem useful or may create unhelpful self-edges: superclass_of, subclass_of, broad_match / narrow_match, close_match / exact_match / same_as
Expected results
- BCR-ABL: separately, BCR and ABL1 are the top gene results (rank 2 and 3), with lots of edges to imatinib and to CML.
- There's also related results like "Fusion Proteins, bcr-abl" (rank 770, umls id, from biothings semmeddb) and "Tyrosine-protein kinase ABL1 (ABL)" (not in top 1000, TTD.TARGET id, from biothings ttd)
- "cell cycle": no exact match.
- There may be Gene intermediates related to the cell cycle, like CCND1
- There were some PhysiologicalProcess intermediates that seem related but they're general concepts, from biothings semmeddb only, and didn't score highly (not in top 1000 results unless otherwise noted):
- cell proliferation (rank 992)
- autophagy (rank 994)
- apoptosis (rank 998)
- growth
- cell growth
- cell survival
- signal transduction
- cell death
intermediate node categories analysis
I searched the response using the console logs for the intermediate node categories, and got this list:
- Disease, PhenotypicFeature
- Gene, Protein
- Chem: SmallMolecule, Drug, MolecularMixture, ChemicalEntity
- PhysiologicalProcess
- Procedure
- Cell (but entities and edges aren't helpful or interesting. Ex: both imatinib and CML are located in "bone marrow cells", edges from biothings semmeddb)
- 1 AnatomicalEntity: blood (both imatinib and CML are located in the blood, edges from biothings semmeddb)
- 1 MolecularActivity: "Down-Regulation" but the edges weren't helpful - imatinib causes down-regulation and CML includes down-regulation (edges from biothings semmeddb)
Console logs for intermediate node categories
After getting the intersection of intermediate nodes, the final console log for the categories was:
bte:biothings-explorer-trapi:QEdge Collected entity ids in records:
["Disease","Gene","SmallMolecule","Drug",
"MolecularMixture", "PhenotypicFeature","Procedure","ChemicalEntity",
"Polypeptide","PhysiologicalProcess","Protein","PathologicalProcess",
"AnatomicalEntity","GrossAnatomicalStructure","Cell","MolecularActivity"] +37ms
But before then, during the imatinib hop, the categories were (basically the same as for the imatinib-asthma testing)
bte:biothings-explorer-trapi:QEdge Collected entity ids in records:
["SmallMolecule","PhenotypicFeature","PhysiologicalProcess","Disease",
"Gene","Protein","PathologicalProcess","Procedure",
"ChemicalEntity","OrganismTaxon","Polypeptide","Cell",
"Phenomenon","Drug","MolecularActivity","DiseaseOrPhenotypicFeature",
"CellularComponent","GrossAnatomicalStructure","AnatomicalEntity","Plant",
"MolecularMixture","ComplexMolecularMixture"] +94ms
And during the CML hop (before the intersecting began), the categories were:
bte:biothings-explorer-trapi:QEdge Collected entity ids in records: ["Disease","Gene","Procedure","SmallMolecule",
"Drug","ChemicalEntity","MolecularMixture","Polypeptide",
"PhenotypicFeature","SequenceVariant","Protein","Pathway",
"CellularComponent","NucleicAcidEntity","PhysiologicalProcess","PathologicalProcess",
"BiologicalEntity","Cohort","OrganismTaxon","Virus",
"Device","GrossAnatomicalStructure","AnatomicalEntity","Cell",
"PopulationOfIndividualOrganisms","MolecularActivity","ComplexMolecularMixture"] +103ms
Basic implementation ideas
EDIT: after discussion with Andrew 1/8.
1. I think the creative-mode query would be like this (click to expand): QNodes aren't set with any biolink-category, the QEdge predicate is set to "related_to"
I think having no QEdge predicate would also make sense, but our current creative-mode won't run when I don't specify a predicate: I get 0 results and the warning log bte:biothings-explorer-trapi:inferred-mode Inferred Mode edge must specify a predicate. Your query terminates. +0ms
.
This is for imatinib ➡️ CML (Chronic myelogenous leukemia)
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:5291"]
},
"n1": {
"ids":["MONDO:0011996"]
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1",
"predicates": ["biolink:related_to"],
"knowledge_type": "inferred"
}
}
}
}
}
- BTE could look up the starting IDs with NodeNorm and retrieve the QNode categories early, before picking the matching templateGroups. Does BTE currently do this (I think it does)?
- BTE should then match the query to the matching Pathfinder templateGroups. All templateGroups will at least have the Explain-query w/ 0 intermediates (see if they or their descendants are directly connected).
click to see generic templates for 0 and 1 intermediates
Notes:
- The templates don't set categories for the
creativeQuerySubject
/creativeQueryObject
because I think it'd be best if BTE plugged in the categories from the NodeNorm ID-lookup. Right now, BTE doesn't seem to be doing this and raises an error (also noted in the next "Issues" section). - I set a predicate list for QEdges to exclude ones that didn't seem useful or may create unhelpful self-edges: superclass_of, subclass_of, broad_match / narrow_match, close_match / exact_match / same_as
- I'm assuming that the intermediates are what matters, so it's better for the 1-intermediate template to be startingID1 ➡️ intermediate ⬅️ startingID2 (rather than a one-direction path from startingID1 ➡️ intermediate ➡️ startingID2)
First template: 0 intermediates
{
"message": {
"query_graph": {
"nodes": {
"creativeQuerySubject": {
},
"creativeQueryObject": {
}
},
"edges": {
"eA": {
"subject": "creativeQuerySubject",
"object": "creativeQueryObject",
"predicates": [
"biolink:related_to_at_instance_level",
"biolink:disease_has_location", "biolink:location_of_disease",
"biolink:composed_primarily_of", "biolink:primarily_composed_of",
"biolink:has_chemical_role",
"biolink:has_member", "biolink:member_of"
]
}
}
}
}
}
Second template: 1 intermediate
{
"message": {
"query_graph": {
"nodes": {
"creativeQuerySubject": {
},
"nA": {
"categories":["biolink:NamedThing"]
},
"creativeQueryObject": {
}
},
"edges": {
"eA": {
"subject": "creativeQuerySubject",
"object": "nA",
"predicates": [
"biolink:related_to_at_instance_level",
"biolink:disease_has_location", "biolink:location_of_disease",
"biolink:composed_primarily_of", "biolink:primarily_composed_of",
"biolink:has_chemical_role",
"biolink:has_member", "biolink:member_of"
]
},
"eB": {
"subject": "creativeQueryObject",
"object": "nA",
"predicates": [
"biolink:related_to_at_instance_level",
"biolink:disease_has_location", "biolink:location_of_disease",
"biolink:composed_primarily_of", "biolink:primarily_composed_of",
"biolink:has_chemical_role",
"biolink:has_member", "biolink:member_of"
]
}
}
}
}
}
- For the first template (0 intermediates), BTE will have 1 result if there are edges between the two starting IDs. But BTE could return a lot of results when there's 1 intermediate (1 per unique intermediate).
Issues for implementation
A. Template-group matching -> Andrew agrees with this
I think Pathfinder queries should only use Pathfinder templates (when both QNodes have 1 ID and knowledge_type: inferred
), and vice-versa
- The Pathfinder templates wouldn't work for MVPs 1-2 because they're too general
- The MVP1-2 templates won't work for Pathfinder because they use
is_set: true
for intermediate nodes, so each result represents a unique "answer" (the QNode at the open end of the Predict-type query). But for Pathfinder/Explain-type where we set both starting QNodes to specific IDs, usingis_set: true
for the intermediate nodes will collapse all the "answers" into 1 giant result - which I assume we don't want.
B. Odd bug(?) noticed before winter break
I'm not sure if this will be a problem if we adjust the template-group matching and test again with a Pathfinder-template-group + queries. But before winter break, I noticed that:
- BTE's creative-mode currently doesn't allow > 1 QNode ID total, across all QNodes
- Jackson set up a query-handler branch inferred_explain that should've removed that check, but something odd seemed to happen when none of the retrieved edges actually connected the two starting IDs.
C. Problems setting up Pathfinder template-groups
With query-handler's inferred_explain
branch checked out, I tried setting up a Pathfinder-template-group with the two generic templates I included above. But I encountered issues (also see my "recreating the problems" section below this list):
- I thought I could set the templateGroup's subject / object to
NamedThing
so it'd work no matter what the starting-ID's category was. But when I tried this, BTE wouldn't use the template-group. If I set the subject / object to every biolink category, then BTE would use the template-group. - But then, BTE has an error
bte:biothings-explorer-trapi:error_handler TypeError: queryGraph.nodes.creativeQuerySubject.categories is not iterable
. I think it's because the generic Pathfinder-templates don't have QNode categories. I intended for BTE to plug in the categories from the NodeNorm ID-lookup...but this doesn't seem to happen here.- could be mitigated by having different templateGroups for different starting QNode categories and then setting those categories in the templates...but it's kinda redundant having multiple first templates (the 0-intermediate one)
- So the Chem -> DiseaseOrPheno has direct edge, 1 intermediate, and specific ones. Put the starting IDs into the Chem + DoP QNodes so they start w/ those categories, then add the ones from NodeNorm
- could be mitigated by having different templateGroups for different starting QNode categories and then setting those categories in the templates...but it's kinda redundant having multiple first templates (the 0-intermediate one)
- I'm not sure if either of these are intended behavior...for example, is there some problem we avoid by not expanding the subject / object categories?
recreating the problems
First, check out query-handler's inferred_explain
branch and replace the contents of query-handler's templateGroups.json file with this (pnpm build
after!):
[
{
"name": "Pathfinder: find paths between two entities",
"subject": ["NamedThing"],
"predicate": ["related_to"],
"object": ["NamedThing"],
"templates": [
"pathfinder-direct.json",
"pathfinder-1intermediate.json"
]
}
]
Second, query BTE with this Pathfinder-style query
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"ids":["PUBCHEM.COMPOUND:5291"],
"name": "imatinib"
},
"n1": {
"ids":["MONDO:0011996"],
"name": "cml"
}
},
"edges": {
"e0": {
"subject": "n0",
"object": "n1",
"predicates": ["biolink:related_to"],
"knowledge_type": "inferred"
}
}
}
}
}
For me, BTE returned no results and the console log said bte:biothings-explorer-trapi:inferred-mode No Templates matched your inferred-mode query. Your query terminates. +0ms
Third, I changed the templateGroup.json contents to include every biolink category (pnpm build
after!).
[
{
"name": "Pathfinder: find paths between two entities",
"subject": ["Attribute","ChemicalRole","BiologicalSex","PhenotypicSex","GenotypicSex","SeverityValue","OrganismAttribute","PhenotypicQuality","Zygosity","ClinicalAttribute","ClinicalMeasurement","ClinicalModifier","ClinicalCourse","Onset","SocioeconomicAttribute","GenomicBackgroundExposure","PathologicalProcessExposure","PathologicalAnatomicalExposure","DiseaseOrPhenotypicFeatureExposure","ChemicalExposure","DrugExposure","DrugToGeneInteractionExposure","ComplexChemicalExposure","BioticExposure","EnvironmentalExposure","GeographicExposure","BehavioralExposure","SocioeconomicExposure","OrganismTaxon","Event","AdministrativeEntity","Agent","InformationContentEntity","StudyResult","ConceptCountAnalysisResult","ObservedExpectedFrequencyAnalysisResult","RelativeFrequencyAnalysisResult","TextMiningResult","ChiSquaredAnalysisResult","LogOddsAnalysisResult","StudyVariable","CommonDataElement","Dataset","DatasetDistribution","DatasetVersion","DatasetSummary","ConfidenceLevel","EvidenceType","Publication","Book","BookChapter","Serial","Article","JournalArticle","Patent","WebPage","PreprintPublication","DrugLabel","RetrievalSource","PhysicalEntity","MaterialSample","Activity","Study","Procedure","Phenomenon","Device","DiagnosticAid","PlanetaryEntity","EnvironmentalProcess","EnvironmentalFeature","GeographicLocation","GeographicLocationAtTime","BiologicalEntity","RegulatoryRegion","AccessibleDnaRegion","TranscriptionFactorBindingSite","BiologicalProcessOrActivity","MolecularActivity","BiologicalProcess","Pathway","PhysiologicalProcess","Behavior","PathologicalProcess","GeneticInheritance","OrganismalEntity","Bacterium","Virus","CellularOrganism","Mammal","Human","Plant","Invertebrate","Vertebrate","Fungus","LifeStage","IndividualOrganism","Case","PopulationOfIndividualOrganisms","StudyPopulation","Cohort","AnatomicalEntity","CellularComponent","Cell","GrossAnatomicalStructure","PathologicalAnatomicalStructure","CellLine","DiseaseOrPhenotypicFeature","Disease","PhenotypicFeature","BehavioralFeature","ClinicalFinding","Gene","MacromolecularComplex","NucleosomeModification","Genome","Polypeptide","Protein","ProteinIsoform","ProteinDomain","PosttranslationalModification","ProteinFamily","NucleicAcidSequenceMotif","GeneFamily","Genotype","Haplotype","SequenceVariant","Snv","ReagentTargetedGene","ChemicalEntity","MolecularEntity","SmallMolecule","NucleicAcidEntity","Exon","Transcript","RnaProduct","RnaProductIsoform","NoncodingRnaProduct","MicroRna","SiRna","CodingSequence","ChemicalMixture","MolecularMixture","Drug","ComplexMolecularMixture","ProcessedMaterial","Food","EnvironmentalFoodContaminant","FoodAdditive","ClinicalEntity","ClinicalTrial","ClinicalIntervention","Hospitalization","Treatment","NamedThing"],
"predicate": ["related_to"],
"object": ["Attribute","ChemicalRole","BiologicalSex","PhenotypicSex","GenotypicSex","SeverityValue","OrganismAttribute","PhenotypicQuality","Zygosity","ClinicalAttribute","ClinicalMeasurement","ClinicalModifier","ClinicalCourse","Onset","SocioeconomicAttribute","GenomicBackgroundExposure","PathologicalProcessExposure","PathologicalAnatomicalExposure","DiseaseOrPhenotypicFeatureExposure","ChemicalExposure","DrugExposure","DrugToGeneInteractionExposure","ComplexChemicalExposure","BioticExposure","EnvironmentalExposure","GeographicExposure","BehavioralExposure","SocioeconomicExposure","OrganismTaxon","Event","AdministrativeEntity","Agent","InformationContentEntity","StudyResult","ConceptCountAnalysisResult","ObservedExpectedFrequencyAnalysisResult","RelativeFrequencyAnalysisResult","TextMiningResult","ChiSquaredAnalysisResult","LogOddsAnalysisResult","StudyVariable","CommonDataElement","Dataset","DatasetDistribution","DatasetVersion","DatasetSummary","ConfidenceLevel","EvidenceType","Publication","Book","BookChapter","Serial","Article","JournalArticle","Patent","WebPage","PreprintPublication","DrugLabel","RetrievalSource","PhysicalEntity","MaterialSample","Activity","Study","Procedure","Phenomenon","Device","DiagnosticAid","PlanetaryEntity","EnvironmentalProcess","EnvironmentalFeature","GeographicLocation","GeographicLocationAtTime","BiologicalEntity","RegulatoryRegion","AccessibleDnaRegion","TranscriptionFactorBindingSite","BiologicalProcessOrActivity","MolecularActivity","BiologicalProcess","Pathway","PhysiologicalProcess","Behavior","PathologicalProcess","GeneticInheritance","OrganismalEntity","Bacterium","Virus","CellularOrganism","Mammal","Human","Plant","Invertebrate","Vertebrate","Fungus","LifeStage","IndividualOrganism","Case","PopulationOfIndividualOrganisms","StudyPopulation","Cohort","AnatomicalEntity","CellularComponent","Cell","GrossAnatomicalStructure","PathologicalAnatomicalStructure","CellLine","DiseaseOrPhenotypicFeature","Disease","PhenotypicFeature","BehavioralFeature","ClinicalFinding","Gene","MacromolecularComplex","NucleosomeModification","Genome","Polypeptide","Protein","ProteinIsoform","ProteinDomain","PosttranslationalModification","ProteinFamily","NucleicAcidSequenceMotif","GeneFamily","Genotype","Haplotype","SequenceVariant","Snv","ReagentTargetedGene","ChemicalEntity","MolecularEntity","SmallMolecule","NucleicAcidEntity","Exon","Transcript","RnaProduct","RnaProductIsoform","NoncodingRnaProduct","MicroRna","SiRna","CodingSequence","ChemicalMixture","MolecularMixture","Drug","ComplexMolecularMixture","ProcessedMaterial","Food","EnvironmentalFoodContaminant","FoodAdditive","ClinicalEntity","ClinicalTrial","ClinicalIntervention","Hospitalization","Treatment","NamedThing"],
"templates": [
"pathfinder-direct.json",
"pathfinder-1intermediate.json"
]
}
]
Then I tried the same query again. I got status 500 and these console logs:
bte:biothings-explorer-trapi:inferred-mode Query proceeding in Inferred Mode. +0ms
bte:biothings-explorer-trapi:inferred-mode Looking up query Templates +0ms
bte:biothings-explorer-trapi:inferred-mode Got 2 inferred query templates. +10ms
bte:biothings-explorer-trapi:error_handler TypeError: queryGraph.nodes.creativeQuerySubject.categories is not iterable
bte:biothings-explorer-trapi:error_handler at /Users/colleenxu/Desktop/biothings_explorer/packages/query_graph_handler/built/inferred_mode/inferred_mode.js:168:70
bte:biothings-explorer-trapi:error_handler at Array.map (<anonymous>)
bte:biothings-explorer-trapi:error_handler at InferredQueryHandler.createQueries (/Users/colleenxu/Desktop/biothings_explorer/packages/query_graph_handler/built/inferred_mode/inferred_mode.js:166:38)
bte:biothings-explorer-trapi:error_handler at async InferredQueryHandler.query (/Users/colleenxu/Desktop/biothings_explorer/packages/query_graph_handler/built/inferred_mode/inferred_mode.js:402:28)
bte:biothings-explorer-trapi:error_handler at async TRAPIQueryHandler._handleInferredEdges (/Users/colleenxu/Desktop/biothings_explorer/packages/query_graph_handler/built/index.js:505:39)
bte:biothings-explorer-trapi:error_handler at async TRAPIQueryHandler.query (/Users/colleenxu/Desktop/biothings_explorer/packages/query_graph_handler/built/index.js:566:13)
bte:biothings-explorer-trapi:error_handler at async task (/Users/colleenxu/Desktop/biothings_explorer/packages/bte-server/built/routes/v1/query_v1.js:34:13)
bte:biothings-explorer-trapi:error_handler at async runTask (/Users/colleenxu/Desktop/biothings_explorer/packages/bte-server/built/controllers/threading/threadHandler.js:265:26)
bte:biothings-explorer-trapi:error_handler at async /Users/colleenxu/Desktop/biothings_explorer/packages/bte-server/built/routes/v1/query_v1.js:18:34 +0ms
D. Other implementation issues
- The second template (1 intermediate) can take a long time to run (> 5 min). Perhaps it'd help if we implemented some of the timing cut-off ideas from group meeting?
- With the second template (1 intermediate), we could end up with paths too long for the current UI to handle (I think it can only handle 3-edge long paths?). For example if both starting IDs have descendants, we could end up with 4-edge paths like
startingID1 -> descendant1 -> intermediate <- descendant2 <- startingID2
. - It'll be helpful to have more Pathfinder use-cases, including ones that aren't chem-disease or chem-gene. Are there any in the Feedback / QotM / old-demo stuff?
Data-source note from Andrew: perhaps a Cell marker database (gene <-> cell type and gene <-> tissue) like http://xteam.xbio.top/CellMarker/search.jsp?quickSearchInfo=c-kit would be helpful to add...
@colleenXu Do you have the data from these queries so we can play with it?
We don't have a full TRAPI response (running all the templates and merging the results into 1 set). The paths in the result sub-graphs may also be too long for the current UI to handle (> 3 edges, 4 nodes?).
You could try working with some of BTE's responses for the individual template-runs (you can ignore the extra notes, that's for our team):
- imatinib → asthma with 0 intermediates
- imatinib → Gene → Cell ← asthma: imatinib-gene-cell-asthma-2.json
- 1 min 6 s, 314 results
- All results are pretty low scoring, with lots of ties
- only 4 unique cell entities: mast cell, eosinophil, t-lymphocyte,t-helper cell type 2
- results w/ gene KIT:
- KIT → mast cell is result 42
- c-KIT → mast cell is result 63
- imatinib → Gene → MolecularActivity ← asthma: imatinib-gene-molecularActivity-asthma.json
- 1 min 21 s, 738 results
- All results are pretty low scoring, with lots of ties
- 13 unique molecular activities, most are fairly general terms: Phosphorylation, DNA Methylation, Signal Transduction Pathways, Up-Regulation (Physiology), Lipid metabolism, Biochemical Pathway, enzyme activity, immunoreactivity, histone acetylation, complement activation, receptor function, carbohydrate metabolism, arachidonic acid metabolic process
- KIT → phosphorylation is result 666
- imatinib → NamedThing ← asthma: imatinib-inter-asthma-4.json
- for notes, look at the "imatinib ➡️ 1 intermediate ⬅️ asthma" section of the above post
- note that it doesn't start out with a category on the imatinib QNode.
Queries ran locally for prototype presentation: https://docs.google.com/presentation/d/1gFFGJGumtHU_ktHKM2FKauTpC-0bvAh-H_ZI49-qDsI/edit?usp=sharing
all queries set imatinib as ChemicalEntity, disease as DiseaseOrPhenotypicFeature. I'm assuming BTE's templates would set the template placeholder nodes to these categories - which is how our templates/implementation currently work. We could adjust this to have no "template categories" in the future maybe?
(Ran on local instance, main branches + fix-776 branches for workspace/api-response-transform for #776. Also w/o threading or caching.)
1 intermediate
imatinib (ChemicalEntity) → NamedThing ← asthma (DiseaseOrPheno)
imatinib-inter-asthma-latest.json
- 6 min 47 s, 949 results
- top results are still KIT, PDGFRA
imatinib (ChemicalEntity) → NamedThing ← CML (DiseaseOrPheno)
imatinib-inter-cml-latest.json
- 5 min 37 s, 1546 results
- top results are still BCR, ABL1
Gene → Cell
See previous post for imatinib → Gene → Cell ← asthma
imatinib (ChemicalEntity) → Gene → Cell ← CML (DiseaseOrPheno)
- 1 min 45 s, 1419 results
- 15 unique entities: interesting ones are Hematopoietic stem cells, Blast Cell, Bone Marrow Cells, granulocyte, Pluripotent Stem Cells
- others: cultured cell line, t-lymphocyte, stem cells, Lymphocyte, Neoplastic Cell, Clone Cells, K-562, Leukemic Cell, lymphoblast, Blood Cells
- results w/ gene BCR:
- BCR → blast cell is result 8 (also connected to hematopoietic stem cells in result 215, bone marrow cells in result 642)
- fusion proteins, bcr-abl → hematopoietic stem cells is result 206
- results w/ gene ABL1:
- ABL1 → hematopoietic stem cells in result 180 (bone marrow cells in result 606)
Gene → PhysiologicalProcess,Pathway
imatinib (ChemicalEntity) → Gene → PhysiologicalProcess,Pathway ← Asthma (DiseaseOrPheno)
imatinib-gene-physiopath-asthma.json
- 6 min 35 s, 1254 results
- Doing this because
- BiologicalProcess takes too long to run (>13 min w/ 21 unique intermediates) - these are the most promising children terms
- most interesting is "IgE responsiveness, atopic"
- KIT connected to edema (HP:0000969) and cardiac rhythm disease (MONDO:0007263), anaphylaxis (MONDO:0100053), respiratory arrest (HP:0005943)
- also, MolecularActivity wasn't interesting (see previous post)
- BiologicalProcess takes too long to run (>13 min w/ 21 unique intermediates) - these are the most promising children terms
- no exact matches for "immune cell activation", but some stuff is close
- no pathways found
- 38 physiologicalprocess terms, most were generic. Some interesting ones were:
- immune response: results 166-195
- bronchoconstriction: result 234
- histamine release
- t-cell activation
- neutrophil infiltration
- immune cell processes
- host defense
- antiviral response
- cytokine production: results 236 - 245
- Negative Regulation of Inflammatory Response Process
imatinib (ChemicalEntity) → Gene → PhysiologicalProcess, Pathway ← Asthma (DiseaseOrPheno)
imatinib-gene-physiopath-cml.json
- 3 min 35 s, 3398 results
- Doing this because BiologicalActivity would probably take too long to run
- no exact matches for "cell cycle", but some stuff is close
- 29 Pathways found! some interesting ones:
- Cyclin D associated events in G1 (Homo sapiens) - reactome
- pathways in cancer - bioplanet
- Inhibition of cellular proliferation by Gleevec - bioplanet
- Chronic myeloid leukemia - bioplanet
- 12 physiologicalprocess terms. Some interesting ones:
- cell proliferation: results 8-282. BCR in 102, ABL1 in 279.
- lymphocyte activation (results 1-7)
- mitotic metaphase
- negative regulation of g2 phase
I did not respond to one of your previous comments, but 3 edges is fine. That is the max though. I look forward to seeing this!
Closing, pathfinder efforts are now in #794