allydunham / yeast_strains

Genotype to phenotype prediction using transcriptomics and proteomics data.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Phenotype prediction of 1,011 S. cerevisiae strains using genotype, transcriptome and proteome data

This anlysis extends some of my previous work, developing genotype to phenotype prediction methods based on omics data from 1,011 S. cerevisiae strains (see Peter et al. 2018). The transcriptomics data was provided by the Schacherer lab and the proteomics data by the Ralser lab. Neither dataset is currently available. The growth phenotypes were measured by members of the Beltrao lab, where I performed this work, as detailed in Galardini et al. (2019).

This repo contains several phenotype prediction analyses, based on genotype (modelled with P(Aff) scores), gene expression and abundance scores, expressed as fold changes compared to that genes median expression:

  • Associations and correlations between P(Aff), expression and abundance, showing generally weak relationships.
  • Phenotype prediction based on linear models using the first 50 PCs of the P(Aff), abundance and expression scores.
  • Variational Auto-Encoder based linear phenotype prediction models, using a custom VAE implementation.
  • Gene based linear models, assessing the strength of association between each gene and phenotype, based on genotype, expression and abundance.

The project is split as follows:

  • bin:
    • analysis - Scripts performing data analysis and figure generation
    • processing - Scripts to parse, format and normalise the raw data
    • util - Additional utility scripts
  • docs - Two lists of genes identified
  • src - shared modules and R config

About

Genotype to phenotype prediction using transcriptomics and proteomics data.

License:Other


Languages

Language:R 88.6%Language:Python 9.2%Language:Shell 2.2%