rec3141 / OGT_prediction

Scripts for calculating features and regression of prokaryote OGT

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OGT_prediction

Scripts for calculating features from genomic sequences, multiple linear regression of those features to the originating species' Optimal Growth Temperatures (OGT), and prediction of a species' OGT using those linear regression models.

See: Sauer & Wang. Predicting the optimal growth temperatures of prokaryotes using only genome derived features. Bioinformatics (2019) https://doi.org/10.1093/bioinformatics/btz059

Installation and Requirements

This has been developed and tested on Ubuntu 18.04 LTS. The scripts should work on any system supporting Python 3, so long as the external programs are installed properly.

  1. Download these scripts. This is easiest using git to clone the repository.
git clone https://github.com/DavidBSauer/OGT_prediction
  1. Install the requirements. These scripts depend upon the programs: Python3, tRNAscan-SE, Bedtools, Barrnap, and Prodigal. These have their own dependencies also. The following python packages also need to be installed: numpy, scipy, matplotlib, biopython, bcbio-gff, tqdm, sklearn, matplotlib, and matplotlib-venn.

To install everything in Ubuntu (or other system that use the apt package manager), go into the downloaded directory and use the pre-made bash script.

cd OGT_prediction
./Ubuntu_setup.bash

If you're on another OS, install all the python packages. Then install bedtools, tRNAscan-SE, barrnap, and prodigal; and all their dependencies. Create a file called external_tools.txt listing bedtools, tRNAscan-SE, barrnap, and prodigal tab separated from the absolute path for each executable. Copy this external_tools.txt file into the feature_calculation and prediction directories.

Demonstration

This uses the previously computed regression models to predict OGTs for a few species. Start within the prediction directory.

  1. Move into the prediction directory.
cd prediction
  1. Download genomes species IN the provided list.
python3 genome_retriever.py ../data/prediction_demo/species.txt IN
  1. Download taxonomic classification for species.
python3 clade_retriever.py ../data/prediction_demo/species.txt your_email_address@awesome.com
  1. Run the prediction script to predict the OGT of each species.
python3 prediction_pipeline.py ../data/prediction_demo/regression_models/ genomes_retrieved.txt species_taxonomic.txt

The final result will be in the file newly_predicted_OGTs.txt, listing each species, the predicted OGT, and the taxonomic model used for the prediction. (Note, these results are not deterministic as the genomes available for each species may change.)

About

Scripts for calculating features and regression of prokaryote OGT

License:GNU General Public License v3.0


Languages

Language:Python 98.6%Language:Shell 1.4%