w-gao / pyGTEx

A Python module to retrieve Genotype-Tissue Expression (GTEx) data programmatically.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pyGTEx

pyGTEx is a module that aims to provide multiple wrappers that allow for ease of access to data pertaining to gene expression in tissues.

NOTE: This project is not affiliated with the Genotype-Tissue Expression (GTEx) Project.

Installing pyGTEx

pyGTEx is available at https://github.com/w-gao/pyGTEx and can be installed by the following command:

pip install git+https://github.com/w-gao/pyGTEx.git

Module Design

Examples

Exploring Available Tissues - TissuesInfoModel

Here is an example of using one of our Models where we fetch all tissues that are supported by GTEx:

# Import pyGTEx like this
import pygtex

# instantiate a TissuesInfoModel, HTTP requests and parsing are taken care of behind the scenes. 
tModel = pygtex.TissuesInfoModel()
tissues = tModel.getTissues('tissueSiteDetailId')

What tissues looks like

The first 10 of 54 identified tissue sites in the body are displayed in the table below.

import pandas as pd
df = pd.DataFrame(tissues[:10], columns=['Tissues'])
df
Tissues
0 Adipose_Subcutaneous
1 Adipose_Visceral_Omentum
2 Adrenal_Gland
3 Artery_Aorta
4 Artery_Coronary
5 Artery_Tibial
6 Bladder
7 Brain_Amygdala
8 Brain_Anterior_cingulate_cortex_BA24
9 Brain_Caudate_basal_ganglia

Exploring GenesModel

If you would like to get information on one or multiple genes, you can do that using the GeneModel or GenesModel, which allows query of genes based on gene symbols or genecode Ids.

gene = 'ace2'
gModel = pygtex.GeneModel(gene)

print('Input gene:', gene)
print(' - GeneSymbol:', gModel.getGeneSymbol())
print(' - GencodeId:', gModel.getGencodeId())
print(' - EntrezGeneId:', gModel.getEntrezGeneId())
Input gene: ace2
 - GeneSymbol: ACE2
 - GencodeId: ENSG00000130234.10
 - EntrezGeneId: 59272

Visualization with GTExVisuals

GTExVisuals comes with the pyGTEx installation, and it is our application of pyGTEx to allow for further ease in visualization by giving the user the option to generate Newick trees, heatmaps, or bar graphs depending on the type of gene and/or tissue data given.

# Import the module like this
import GTExVisuals

plotGeneExpression

Explore the tissues where a gene is most expressed

# ADH1C is a cool gene

# "Variants in the DNA coding for ADH can affect how quickly a person converts alcohol into acetaldehyde"
# https://blog.helix.com/alcohol-effects-the-weekly-gene-adh1c/

GTExVisuals.plotGeneExpression('ADH1C', tissues, figsize=[10, 6], sortBy=None)

png

# Age related expression

# "Age spots have abnormally high levels of expression of KRT5"
# https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5342934/#!po=82.1429
skinTissues = ["Skin_Not_Sun_Exposed_Suprapubic", "Skin_Sun_Exposed_Lower_leg"]
GTExVisuals.plotGeneExpression('KRT5', skinTissues, figsize=[15, 6], sortBy="ageBracket", rot=0)

png

plotTopExpressedGene

See what genes are top expressed in a tissue site.

GTExVisuals.plotTopExpressedGene('Liver', filterMtGene=True, figsize=[10, 6])

png

plotMedianGeneExpression

We can visualize median gene expression data from multiple Gencode IDs on particular tissues.

genesOfInterest = ['ALDH2', 'ADH1B', 'ADH1C']
tissuesOfInterest = ['Liver', 'Colon_Transverse', 'Nerve_Tibial', 'Small_Intestine_Terminal_Ileum', 'Stomach', 'Adipose_Subcutaneous', 'Adipose_Visceral_Omentum']

GTExVisuals.plotMedianGeneExpression(genesOfInterest, tissueIds=tissuesOfInterest, figsize=[8, 6])

png

plotMedianGeneExpressionHeatmap

We can visualize median gene expression data with a heatmap.

GTExVisuals.plotMedianGeneExpressionHeatmap(genesOfInterest, tissueIds=tissuesOfInterest, figsize=[10, 5])

png

LICENSE

Copyright (C) 2022 William Gao and Rose Delvillar, under the MIT license.

About

A Python module to retrieve Genotype-Tissue Expression (GTEx) data programmatically.

License:MIT License


Languages

Language:Python 100.0%