Harvest feature's protein effects, domains and pathways
bwalsh opened this issue · comments
Brian commented
Creating this issue as a placeholder for future discussion.
We have already done some of this work here on our local instance at OHSU.
As a researcher, in order to find more evidence for a particular allele, I would like the system to match by:
- protein effect
- protein domain
- pathway
Given a feature:
{'start': 204494718L, 'description': '', 'name': 'MDM4 ', 'referenceName': 'GRCh37', 'geneSymbol': 'MDM4', 'alt': 'G', 'ref': 'C', 'chromosome': '1', 'biomarker_type': 'substitution'}
Given we already have the genomic location, our current process picks up with allele registry
http://reg.genome.network/allele?hgvs=NC_000001.10%3Ag.204494718C%3EG
The new process would continue harvesting with vep and pathway commons
https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000356148.1:p.Ile24Met?domains=1&protein=1&uniprot=1
https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000396840.2:p.Ile24Met?domains=1&protein=1&uniprot=1
https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000443816.2:p.Ile24Met?domains=1&protein=1&uniprot=1
https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000478080.1:p.Ile24Met?domains=1&protein=1&uniprot=1
{u'error': u"Unable to parse HGVS notation 'ENSP00000478080.1:p.Ile24Met': Could not get a Transcript object for 'ENSP00000478080'"} https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000478080.1:p.Ile24Met?domains=1&protein=1&uniprot=1
https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000478581.1:p.Ile24Met?domains=1&protein=1&uniprot=1
{u'error': u"Unable to parse HGVS notation 'ENSP00000478581.1:p.Ile24Met': Could not get a Transcript object for 'ENSP00000478581'"} https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000478581.1:p.Ile24Met?domains=1&protein=1&uniprot=1
https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000356151.3:p.Ile24Met?domains=1&protein=1&uniprot=1
https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000375811.2:p.Ile24Met?domains=1&protein=1&uniprot=1
https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000482479.1:p.Ile24Met?domains=1&protein=1&uniprot=1
{u'error': u"Unable to parse HGVS notation 'ENSP00000482479.1:p.Ile24Met': Could not get a Transcript object for 'ENSP00000482479'"} https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000482479.1:p.Ile24Met?domains=1&protein=1&uniprot=1
https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000356150.3:p.Ile24Met?domains=1&protein=1&uniprot=1
https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000482388.1:p.Ile24Met?domains=1&protein=1&uniprot=1
{u'error': u"Unable to parse HGVS notation 'ENSP00000482388.1:p.Ile24Met': Could not get a Transcript object for 'ENSP00000482388'"} https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000482388.1:p.Ile24Met?domains=1&protein=1&uniprot=1
http://www.pathwaycommons.org/pc2/search.json?q=O15151&organism=homo%20sapiens
The result will add the following keys to feature:
- pathways
- swissprots
- protein_effects
- protein_domains
For example:
{
'pathways': [
u'http://pathwaycommons.org/pc2/Pathway_1d99625aa7f8f4a3f3189e8b237522c3'
,
u'http://identifiers.org/reactome/R-HSA-597592',
u'http://identifiers.org/reactome/R-HSA-212436',
u'http://identifiers.org/reactome/R-HSA-5688426',
u'http://identifiers.org/reactome/R-HSA-69563',
u'http://identifiers.org/reactome/R-HSA-69541',
u'http://identifiers.org/panther.pathway/P00033',
u'http://identifiers.org/reactome/R-HSA-69580',
u'http://identifiers.org/panther.pathway/P00059',
u'http://identifiers.org/reactome/R-HSA-8953897',
u'http://identifiers.org/reactome/R-HSA-6804757',
u'http://identifiers.org/reactome/R-HSA-6804756',
u'http://identifiers.org/reactome/R-HSA-69620',
u'http://identifiers.org/panther.pathway/P04392',
u'http://identifiers.org/reactome/R-HSA-1640170',
u'http://identifiers.org/reactome/R-HSA-73857',
u'http://identifiers.org/reactome/R-HSA-5633007',
u'http://identifiers.org/reactome/R-HSA-74160',
u'http://identifiers.org/reactome/R-HSA-2262752',
u'http://identifiers.org/reactome/R-HSA-69615',
u'http://identifiers.org/reactome/R-HSA-5689880',
u'http://identifiers.org/reactome/R-HSA-2559585',
u'http://identifiers.org/reactome/R-HSA-3700989',
u'http://identifiers.org/reactome/R-HSA-6804760',
u'http://identifiers.org/reactome/R-HSA-2559580',
u'http://identifiers.org/reactome/R-HSA-2559583',
u'http://identifiers.org/reactome/R-HSA-6806003',
u'http://identifiers.org/reactome/R-HSA-392499',
],
'provenance_rule': 'from_source',
'provenance': ['http://reg.genome.network/allele?hgvs=NC_000001.10%3Ag.204494718C%3EG'
],
'end': 204494718L,
'description': '',
'links': [
u'http://myvariant.info/v1/variant/chr1:g.204525590C>G?assembly=hg38'
,
u'http://myvariant.info/v1/variant/chr1:g.204494718C>G?assembly=hg19'
,
u'http://reg.genome.network/refseq/RS000001',
u'http://reg.genome.network/refseq/RS000025',
u'http://reg.genome.network/refseq/RS000049',
u'http://reg.genome.network/refseq/RS004351',
u'http://reg.genome.network/allele/CA344360355',
],
'sequence_ontology': {
'hierarchy': [u'SO:0000110', u'SO:0002072', u'SO:0001059'],
'soid': u'SO:1000002',
'parent_name': u'sequence_feature',
'name': u'substitution',
'parent_soid': u'SO:0000110',
},
'swissprots': [u'O15151'],
'protein_effects': [
u'ENSP00000356148.1:p.Ile24Met',
u'ENSP00000396840.2:p.Ile24Met',
u'ENSP00000443816.2:p.Ile24Met',
u'ENSP00000478080.1:p.Ile24Met',
u'XP_011507868.1:p.Ile24Met',
u'XP_011507869.1:p.Ile24Met',
u'ENSP00000478581.1:p.Ile24Met',
u'NP_001265445.1:p.Ile24Met',
u'ENSP00000356151.3:p.Ile24Met',
u'NP_001265446.1:p.Ile24Met',
u'NP_001265447.1:p.Ile24Met',
u'NP_001265448.1:p.Ile24Met',
u'XP_011507870.1:p.Ile24Met',
u'ENSP00000375811.2:p.Ile24Met',
u'ENSP00000482479.1:p.Ile24Met',
u'NP_002384.2:p.Ile24Met',
u'ENSP00000356150.3:p.Ile24Met',
u'NP_001191101.1:p.Ile24Met',
u'ENSP00000482388.1:p.Ile24Met',
u'XP_011507867.1:p.Ile24Met',
u'NP_001191100.1:p.Ile24Met',
u'XP_006711391.1:p.Ile24Met',
],
'start': 204494718L,
'synonyms': [
u'CM000663.1:g.204494718C>G',
u'chr1:g.204525590C>G',
u'chr1:g.204494718C>G',
u'NC_000001.10:g.204494718C>G',
u'NC_000001.9:g.202761341C>G',
u'NG_029367.1:g.14212C>G',
u'CM000663.2:g.204525590C>G',
u'NC_000001.11:g.204525590C>G',
],
'biomarker_type': 'substitution',
'referenceName': 'GRCh37',
'protein_domains': [{u'db': u'hmmpanther', u'name': u'PTHR10360'},
{u'db': u'Superfamily_domains',
u'name': u'SSF47592'}, {u'db': u'PIRSF_domain',
u'name': u'PIRSF500699'},
{u'db': u'PIRSF_domain', u'name': u'PIRSF006748'
}, {u'db': u'Gene3D', u'name': 1}],
'geneSymbol': 'MDM4',
'alt': 'G',
'ref': 'C',
'chromosome': '1',
'name': 'MDM4 ',
}