Feature tagged by cgi as 'CNA', but biomarker_type set to 'nonsense'
bwalsh opened this issue · comments
Hey @bwalsh, I can't seem to replicate this behavior. There was an issue where no biomarker type was showing up, which I've fixed, but nowhere can I see the biomarker type set to 'nonsense'. Where are you seeing this?
This was from the CGI source, and the associated key that contained the response data was remapped from cgi
to raw
in the notebook that @bwalsh took screenshots from.
So for an association from CGI, you'd want to examine:
assocation['cgi']
I think, but have not yet confirmed, that it is being set here
https://github.com/ohsu-comp-bio/g2p-aggregator/blob/v0.9/harvester/feature_enricher.py#L113-L117
Another instance of this:
{'association': {'description': 'FLCN Everolimus (MTOR inhibitor) Responsive',
'drug_labels': 'EVEROLIMUS',
'environmentalContexts': [{'approved_countries': ['Canada', 'US'],
'description': 'EVEROLIMUS',
'id': 'CID6442177',
'source': 'http://rdf.ncbi.nlm.nih.gov/pubchem/compound',
'taxonomy': {'class': 'Macrolide lactams',
'direct-parent': 'Macrolide lactams',
'kingdom': 'Organic compounds',
'superclass': 'Phenylpropanoids and polyketides'},
'term': 'EVEROLIMUS',
'toxicity': 'IC50 of 0.63 nM.',
'usan_stem': 'immunosuppressives: immunosuppressant, rapamycin derivatives'}],
'evidence': [{'description': 'Responsive',
'evidenceType': {'sourceName': 'cgi'},
'info': {'publications': ['http://www.ncbi.nlm.nih.gov/pubmed/25295501']}}],
'evidence_label': 'C',
'evidence_level': 3,
'phenotype': {'description': 'thyroid cancer',
'family': 'endocrine gland cancer',
'type': {'id': 'DOID:1781',
'source': 'http://purl.obolibrary.org/obo/doid',
'term': 'thyroid cancer'}},
'publication_url': 'http://www.ncbi.nlm.nih.gov/pubmed/25295501',
'response_type': 'Responsive'},
'dev_tags': [],
'feature_names': 'FLCN deletion',
'features': [{'alt': 'A',
'biomarker_type': 'nonsense',
'chromosome': '17',
'description': 'FLCN deletion',
'end': 17122384,
'geneSymbol': 'FLCN',
'links': ['http://myvariant.info/v1/variant/chr17:g.17122384G>A?assembly=hg19',
'http://reg.genome.network/refseq/RS000041',
'http://reg.genome.network/refseq/RS000017',
'http://reg.genome.network/refseq/RS000765',
'http://reg.genome.network/allele/CA498163528',
'http://reg.genome.network/refseq/RS000065'],
'name': 'FLCN deletion',
'provenance': ['http://myvariant.info/v1/query?q=FLCN deletion',
'http://reg.genome.network/allele?hgvs=NC_000017.10%3Ag.17122384G%3EA'],
'provenance_rule': 'default_feature',
'ref': 'G',
'referenceName': 'GRCh37',
'sequence_ontology': {'hierarchy': ['SO:0001060',
'SO:0001537',
'SO:0001878',
'SO:0001564',
'SO:0001576',
'SO:0001791',
'SO:0001580',
'SO:0001818',
'SO:0001650',
'SO:0001992'],
'name': 'stop_gained',
'parent_name': 'sequence_variant',
'parent_soid': 'SO:0001060',
'soid': 'SO:0001587'},
'start': 17122384,
'synonyms': ['NG_008001.2:g.23119C>T',
'NC_000017.11:g.17219070G>A',
'LRG_325:g.23119C>T',
'NC_000017.10:g.17122384G>A',
'CM000679.1:g.17122384G>A',
'chr17:g.17122384G>A',
'CM000679.2:g.17219070G>A',
'NC_000017.9:g.17063109G>A']}],
'genes': ['FLCN'],
'raw': {'Alteration': 'FLCN:del',
'Alteration type': 'CNA',
'Assay type': '',
'Association': 'Responsive',
'Biomarker': 'FLCN deletion',
'Curator': 'RDientsmann',
'Drug': 'Everolimus',
'Drug family': 'MTOR inhibitor',
'Drug full name': 'Everolimus (MTOR inhibitor)',
'Drug status': '',
'Evidence level': 'Case report',
'Gene': 'FLCN',
'Metastatic Tumor Type': '',
'Primary Tumor type': 'Thyroid',
'Source': 'PMID:25295501',
'Targeting': '',
'cDNA': [''],
'gDNA': [''],
'individual_mutation': [''],
'info': [''],
'region': [''],
'strand': [''],
'transcript': ['']},
'source': 'cgi',
'tags': []}
@ahwagner : can you double check? This data appears prior to v0.10's provenance rules.
The data I have for this association is:
$ cat cgi.json | grep "FLCN deletion" | jq .features
[
{
"provenance_rule": "is_deletion",
"end": 17140502,
"description": "FLCN deletion",
"provenance": [
"http://mygene.info/v3/query?q=FLCN&fields=genomic_pos_hg19"
],
"start": 17115526,
"referenceName": "GRCh37",
"geneSymbol": "FLCN",
"chromosome": "17",
"name": "FLCN deletion"
}
]
Regarding the original item reported at the top of this thread, I believe we have resolved this.
The current feature is:
{
"provenance_rule": "is_deletion",
"end": 1228428,
"name": "STK11 deletion",
"sequence_ontology": {
"hierarchy": [
"SO:0000110",
"SO:0002072",
"SO:0001059"
],
"soid": "SO:0000159",
"parent_name": "sequence_feature",
"name": "deletion",
"parent_soid": "SO:0000110"
},
"provenance": [
"http://mygene.info/v3/query?q=STK11&fields=genomic_pos_hg19"
],
"start": 1189406,
"biomarker_type": "del",
"referenceName": "GRCh37",
"geneSymbol": "STK11",
"chromosome": "19",
"description": "STK11 deletion"
}
@ahwagner , @mayfielg : can we close this?
Will review today.