pyGTEx is a module that aims to provide multiple wrappers that allow for ease of access to data pertaining to gene expression in tissues.
NOTE: This project is not affiliated with the Genotype-Tissue Expression (GTEx) Project.
pyGTEx is available at https://github.com/w-gao/pyGTEx and can be installed by the following command:
pip install git+https://github.com/w-gao/pyGTEx.git
Here is an example of using one of our Models where we fetch all tissues that are supported by GTEx:
# Import pyGTEx like this
import pygtex
# instantiate a TissuesInfoModel, HTTP requests and parsing are taken care of behind the scenes.
tModel = pygtex.TissuesInfoModel()
tissues = tModel.getTissues('tissueSiteDetailId')
The first 10 of 54 identified tissue sites in the body are displayed in the table below.
import pandas as pd
df = pd.DataFrame(tissues[:10], columns=['Tissues'])
df
Tissues | |
---|---|
0 | Adipose_Subcutaneous |
1 | Adipose_Visceral_Omentum |
2 | Adrenal_Gland |
3 | Artery_Aorta |
4 | Artery_Coronary |
5 | Artery_Tibial |
6 | Bladder |
7 | Brain_Amygdala |
8 | Brain_Anterior_cingulate_cortex_BA24 |
9 | Brain_Caudate_basal_ganglia |
If you would like to get information on one or multiple genes, you can do that using the GeneModel or GenesModel, which allows query of genes based on gene symbols or genecode Ids.
gene = 'ace2'
gModel = pygtex.GeneModel(gene)
print('Input gene:', gene)
print(' - GeneSymbol:', gModel.getGeneSymbol())
print(' - GencodeId:', gModel.getGencodeId())
print(' - EntrezGeneId:', gModel.getEntrezGeneId())
Input gene: ace2
- GeneSymbol: ACE2
- GencodeId: ENSG00000130234.10
- EntrezGeneId: 59272
GTExVisuals comes with the pyGTEx installation, and it is our application of pyGTEx to allow for further ease in visualization by giving the user the option to generate Newick trees, heatmaps, or bar graphs depending on the type of gene and/or tissue data given.
# Import the module like this
import GTExVisuals
Explore the tissues where a gene is most expressed
# ADH1C is a cool gene
# "Variants in the DNA coding for ADH can affect how quickly a person converts alcohol into acetaldehyde"
# https://blog.helix.com/alcohol-effects-the-weekly-gene-adh1c/
GTExVisuals.plotGeneExpression('ADH1C', tissues, figsize=[10, 6], sortBy=None)
# Age related expression
# "Age spots have abnormally high levels of expression of KRT5"
# https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5342934/#!po=82.1429
skinTissues = ["Skin_Not_Sun_Exposed_Suprapubic", "Skin_Sun_Exposed_Lower_leg"]
GTExVisuals.plotGeneExpression('KRT5', skinTissues, figsize=[15, 6], sortBy="ageBracket", rot=0)
See what genes are top expressed in a tissue site.
GTExVisuals.plotTopExpressedGene('Liver', filterMtGene=True, figsize=[10, 6])
We can visualize median gene expression data from multiple Gencode IDs on particular tissues.
genesOfInterest = ['ALDH2', 'ADH1B', 'ADH1C']
tissuesOfInterest = ['Liver', 'Colon_Transverse', 'Nerve_Tibial', 'Small_Intestine_Terminal_Ileum', 'Stomach', 'Adipose_Subcutaneous', 'Adipose_Visceral_Omentum']
GTExVisuals.plotMedianGeneExpression(genesOfInterest, tissueIds=tissuesOfInterest, figsize=[8, 6])
We can visualize median gene expression data with a heatmap.
GTExVisuals.plotMedianGeneExpressionHeatmap(genesOfInterest, tissueIds=tissuesOfInterest, figsize=[10, 5])
Copyright (C) 2022 William Gao and Rose Delvillar, under the MIT license.