ohsu-comp-bio / g2p-aggregator

Associations of genomic features, drugs and diseases

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

jax trials feature harmonization

bwalsh opened this issue · comments

commented

Following up on our conversation re. jax trials.

Observations:

  • It is unusual that all jax trials have harmonized features
  • For example, all features on this evidence items have genomic location (click to see detail)
    image

I've checked in a test to illustrate the problem harvester/tests/integration/test_jax_trials_features.py

I've validated that the MM variant endpoint does not return a false hit.

There are a chain of potential challenges and fixes:

To run:
$ pytest -s tests/integration/test_jax_trials_features.py::test_profiles

profile >MET alterations< gene_index[i] >MET< mut_index[i] >< matches 418
profile >MET positive< gene_index[i] >MET< mut_index[i] >< matches 418
profile >MET amp< gene_index[i] >MET< mut_index[i] >< matches 418

The simplest approach is to bypass any normalization attempts if _parse_profile does not return any mutations. The key issue is that MM's API appears to return all mutations in a gene when given something like "MET amp", which is confusing and problematic in our case.

commented

@jgoecks

Quick clarification, mm does properly return no mutations

I've validated that the MM variant endpoint does not return a false hit.

I've made the following change

+++ b/harvester/cosmic_lookup_table.py
@@ -39,6 +39,9 @@ class CosmicLookup(object):
             # return null
             logging.warning('get_entries gene: %s, hgvs_p: %s', gene, hgvs_p)
             return []
+        # ensure caller passed a hgvs_p
+        if not hgvs_p or len(hgvs_p) == 0:
+            return []
         # Get lookup table.
         if gene in self.gene_df_cache:
             # Found gene-filtered lookup table in cache.

jax and jax_trials feature_normalization went from 85%, 93% to 43%, 11% respectively

Ah, so the issue was that the COSMIC lookup table was returning all mutations, not MM. Your change looks reasonable. Thanks!

@bwalsh When I run this test I see:

profile >MET alterations< gene_index[i] >MET< mut_index[i] >< matches 0
profile >MET positive< gene_index[i] >MET< mut_index[i] >< matches 0
profile >MET amp< gene_index[i] >MET< mut_index[i] >< matches 0

And then the test fails with

E       AssertionError: assert 3 == 0
E        +  where 3 = len([{'biomarker_type': 'mutant', 'geneSymbol': 'MET', 'name': 'MET  '}, {'biomarker_type': 'polymorphism', 'geneSymbol': 'MET', 'name': 'MET  positive'}, {'biomarker_type': 'polymorphism', 'geneSymbol': 'MET', 'name': 'MET  amp'}])

Based on the way this test is written, I think it really should be 3. Why did we have 0 before?

commented

thanks, will check tomorrow