ohsu-comp-bio / g2p-aggregator

Associations of genomic features, drugs and diseases

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Harvest feature's protein effects, domains and pathways

bwalsh opened this issue · comments

commented

Creating this issue as a placeholder for future discussion.

We have already done some of this work here on our local instance at OHSU.

As a researcher, in order to find more evidence for a particular allele, I would like the system to match by:

  • protein effect
  • protein domain
  • pathway

Given a feature:

{'start': 204494718L, 'description': '', 'name': 'MDM4 ', 'referenceName': 'GRCh37', 'geneSymbol': 'MDM4', 'alt': 'G', 'ref': 'C', 'chromosome': '1', 'biomarker_type': 'substitution'}

Given we already have the genomic location, our current process picks up with allele registry

http://reg.genome.network/allele?hgvs=NC_000001.10%3Ag.204494718C%3EG

The new process would continue harvesting with vep and pathway commons

https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000356148.1:p.Ile24Met?domains=1&protein=1&uniprot=1
https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000396840.2:p.Ile24Met?domains=1&protein=1&uniprot=1
https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000443816.2:p.Ile24Met?domains=1&protein=1&uniprot=1
https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000478080.1:p.Ile24Met?domains=1&protein=1&uniprot=1
{u'error': u"Unable to parse HGVS notation 'ENSP00000478080.1:p.Ile24Met': Could not get a Transcript object for 'ENSP00000478080'"} https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000478080.1:p.Ile24Met?domains=1&protein=1&uniprot=1
https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000478581.1:p.Ile24Met?domains=1&protein=1&uniprot=1
{u'error': u"Unable to parse HGVS notation 'ENSP00000478581.1:p.Ile24Met': Could not get a Transcript object for 'ENSP00000478581'"} https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000478581.1:p.Ile24Met?domains=1&protein=1&uniprot=1
https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000356151.3:p.Ile24Met?domains=1&protein=1&uniprot=1
https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000375811.2:p.Ile24Met?domains=1&protein=1&uniprot=1
https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000482479.1:p.Ile24Met?domains=1&protein=1&uniprot=1
{u'error': u"Unable to parse HGVS notation 'ENSP00000482479.1:p.Ile24Met': Could not get a Transcript object for 'ENSP00000482479'"} https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000482479.1:p.Ile24Met?domains=1&protein=1&uniprot=1
https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000356150.3:p.Ile24Met?domains=1&protein=1&uniprot=1
https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000482388.1:p.Ile24Met?domains=1&protein=1&uniprot=1
{u'error': u"Unable to parse HGVS notation 'ENSP00000482388.1:p.Ile24Met': Could not get a Transcript object for 'ENSP00000482388'"} https://grch37.rest.ensembl.org/vep/human/hgvs/ENSP00000482388.1:p.Ile24Met?domains=1&protein=1&uniprot=1
http://www.pathwaycommons.org/pc2/search.json?q=O15151&organism=homo%20sapiens

The result will add the following keys to feature:

  • pathways
  • swissprots
  • protein_effects
  • protein_domains

For example:

{
    'pathways': [
        u'http://pathwaycommons.org/pc2/Pathway_1d99625aa7f8f4a3f3189e8b237522c3'
            ,
        u'http://identifiers.org/reactome/R-HSA-597592',
        u'http://identifiers.org/reactome/R-HSA-212436',
        u'http://identifiers.org/reactome/R-HSA-5688426',
        u'http://identifiers.org/reactome/R-HSA-69563',
        u'http://identifiers.org/reactome/R-HSA-69541',
        u'http://identifiers.org/panther.pathway/P00033',
        u'http://identifiers.org/reactome/R-HSA-69580',
        u'http://identifiers.org/panther.pathway/P00059',
        u'http://identifiers.org/reactome/R-HSA-8953897',
        u'http://identifiers.org/reactome/R-HSA-6804757',
        u'http://identifiers.org/reactome/R-HSA-6804756',
        u'http://identifiers.org/reactome/R-HSA-69620',
        u'http://identifiers.org/panther.pathway/P04392',
        u'http://identifiers.org/reactome/R-HSA-1640170',
        u'http://identifiers.org/reactome/R-HSA-73857',
        u'http://identifiers.org/reactome/R-HSA-5633007',
        u'http://identifiers.org/reactome/R-HSA-74160',
        u'http://identifiers.org/reactome/R-HSA-2262752',
        u'http://identifiers.org/reactome/R-HSA-69615',
        u'http://identifiers.org/reactome/R-HSA-5689880',
        u'http://identifiers.org/reactome/R-HSA-2559585',
        u'http://identifiers.org/reactome/R-HSA-3700989',
        u'http://identifiers.org/reactome/R-HSA-6804760',
        u'http://identifiers.org/reactome/R-HSA-2559580',
        u'http://identifiers.org/reactome/R-HSA-2559583',
        u'http://identifiers.org/reactome/R-HSA-6806003',
        u'http://identifiers.org/reactome/R-HSA-392499',
        ],
    'provenance_rule': 'from_source',
    'provenance': ['http://reg.genome.network/allele?hgvs=NC_000001.10%3Ag.204494718C%3EG'
                   ],
    'end': 204494718L,
    'description': '',
    'links': [
        u'http://myvariant.info/v1/variant/chr1:g.204525590C>G?assembly=hg38'
            ,
        u'http://myvariant.info/v1/variant/chr1:g.204494718C>G?assembly=hg19'
            ,
        u'http://reg.genome.network/refseq/RS000001',
        u'http://reg.genome.network/refseq/RS000025',
        u'http://reg.genome.network/refseq/RS000049',
        u'http://reg.genome.network/refseq/RS004351',
        u'http://reg.genome.network/allele/CA344360355',
        ],
    'sequence_ontology': {
        'hierarchy': [u'SO:0000110', u'SO:0002072', u'SO:0001059'],
        'soid': u'SO:1000002',
        'parent_name': u'sequence_feature',
        'name': u'substitution',
        'parent_soid': u'SO:0000110',
        },
    'swissprots': [u'O15151'],
    'protein_effects': [
        u'ENSP00000356148.1:p.Ile24Met',
        u'ENSP00000396840.2:p.Ile24Met',
        u'ENSP00000443816.2:p.Ile24Met',
        u'ENSP00000478080.1:p.Ile24Met',
        u'XP_011507868.1:p.Ile24Met',
        u'XP_011507869.1:p.Ile24Met',
        u'ENSP00000478581.1:p.Ile24Met',
        u'NP_001265445.1:p.Ile24Met',
        u'ENSP00000356151.3:p.Ile24Met',
        u'NP_001265446.1:p.Ile24Met',
        u'NP_001265447.1:p.Ile24Met',
        u'NP_001265448.1:p.Ile24Met',
        u'XP_011507870.1:p.Ile24Met',
        u'ENSP00000375811.2:p.Ile24Met',
        u'ENSP00000482479.1:p.Ile24Met',
        u'NP_002384.2:p.Ile24Met',
        u'ENSP00000356150.3:p.Ile24Met',
        u'NP_001191101.1:p.Ile24Met',
        u'ENSP00000482388.1:p.Ile24Met',
        u'XP_011507867.1:p.Ile24Met',
        u'NP_001191100.1:p.Ile24Met',
        u'XP_006711391.1:p.Ile24Met',
        ],
    'start': 204494718L,
    'synonyms': [
        u'CM000663.1:g.204494718C>G',
        u'chr1:g.204525590C>G',
        u'chr1:g.204494718C>G',
        u'NC_000001.10:g.204494718C>G',
        u'NC_000001.9:g.202761341C>G',
        u'NG_029367.1:g.14212C>G',
        u'CM000663.2:g.204525590C>G',
        u'NC_000001.11:g.204525590C>G',
        ],
    'biomarker_type': 'substitution',
    'referenceName': 'GRCh37',
    'protein_domains': [{u'db': u'hmmpanther', u'name': u'PTHR10360'},
                        {u'db': u'Superfamily_domains',
                        u'name': u'SSF47592'}, {u'db': u'PIRSF_domain',
                        u'name': u'PIRSF500699'},
                        {u'db': u'PIRSF_domain', u'name': u'PIRSF006748'
                        }, {u'db': u'Gene3D', u'name': 1}],
    'geneSymbol': 'MDM4',
    'alt': 'G',
    'ref': 'C',
    'chromosome': '1',
    'name': 'MDM4 ',
    }