ohsu-comp-bio / g2p-aggregator

Associations of genomic features, drugs and diseases

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

jax trials feature harmonization

bwalsh opened this issue · comments


Following up on our conversation re. jax trials.


  • It is unusual that all jax trials have harmonized features
  • For example, all features on this evidence items have genomic location (click to see detail)

I've checked in a test to illustrate the problem harvester/tests/integration/test_jax_trials_features.py

I've validated that the MM variant endpoint does not return a false hit.

There are a chain of potential challenges and fixes:

To run:
$ pytest -s tests/integration/test_jax_trials_features.py::test_profiles

profile >MET alterations< gene_index[i] >MET< mut_index[i] >< matches 418
profile >MET positive< gene_index[i] >MET< mut_index[i] >< matches 418
profile >MET amp< gene_index[i] >MET< mut_index[i] >< matches 418

The simplest approach is to bypass any normalization attempts if _parse_profile does not return any mutations. The key issue is that MM's API appears to return all mutations in a gene when given something like "MET amp", which is confusing and problematic in our case.



Quick clarification, mm does properly return no mutations

I've validated that the MM variant endpoint does not return a false hit.

I've made the following change

+++ b/harvester/cosmic_lookup_table.py
@@ -39,6 +39,9 @@ class CosmicLookup(object):
             # return null
             logging.warning('get_entries gene: %s, hgvs_p: %s', gene, hgvs_p)
             return []
+        # ensure caller passed a hgvs_p
+        if not hgvs_p or len(hgvs_p) == 0:
+            return []
         # Get lookup table.
         if gene in self.gene_df_cache:
             # Found gene-filtered lookup table in cache.

jax and jax_trials feature_normalization went from 85%, 93% to 43%, 11% respectively

Ah, so the issue was that the COSMIC lookup table was returning all mutations, not MM. Your change looks reasonable. Thanks!

@bwalsh When I run this test I see:

profile >MET alterations< gene_index[i] >MET< mut_index[i] >< matches 0
profile >MET positive< gene_index[i] >MET< mut_index[i] >< matches 0
profile >MET amp< gene_index[i] >MET< mut_index[i] >< matches 0

And then the test fails with

E       AssertionError: assert 3 == 0
E        +  where 3 = len([{'biomarker_type': 'mutant', 'geneSymbol': 'MET', 'name': 'MET  '}, {'biomarker_type': 'polymorphism', 'geneSymbol': 'MET', 'name': 'MET  positive'}, {'biomarker_type': 'polymorphism', 'geneSymbol': 'MET', 'name': 'MET  amp'}])

Based on the way this test is written, I think it really should be 3. Why did we have 0 before?


thanks, will check tomorrow