ahmadpgh / ImaGene

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ImaGene

ImaGene implements a supervised machine learning algorithm to predict natural selection and estimate selection coefficients from population genomic data. Specifically, it uses a convolutional neural network (CNN) which takes as input haplotypes for a population and locus of interest. It outputs confusion matrices as well as point estimates of the selection coefficient along with its posterior distribution and various metrics of confidence.

Citation

The original manuscript can be found here and it is open access. You should cite it as:

Torada, L., Lorenzon, L., Beddis, A. et al. ImaGene: a convolutional neural network to quantify natural selection from genomic data. BMC Bioinformatics 20, 337 (2019)

doi:10.1186/s12859-019-2927-x

and you can download the citation file here

Download and installation

Download the repository using git.

git clone https://github.com/mfumagalli/ImaGene

ImaGene runs under Python3 and it inferfaces with keras. We recommend using conda to set the environment and take care of all dependencies. There are detailed instructions on how to download conda for linux and macOS.

ImaGene is currently interfaced with msms but you are required to download it separately following the instructions here. Follow the link, download the .zip folder and extract it. The .jar file of interest will be in the lib folder. There are no requirements for msms to be installed in a specific folder. However, msms requires java to be installed. On unix Debian systems just type sudo apt-get update && apt-get upgrade; sudo apt-get install default-jdk Otherwise follow the link here if you need to install java on other systems. Remember that java must be in your /usr/bin folder. In unix systems you can create a symbolic link with ln -s ~/Downloads/java-XXX/jre/bin/java /usr/bin/java, as an example.

Usage

Please look at the jupyter notebook Tutorial_binary.ipynb for a short tutorial on how to use ImaGene for predicting natural selection with a simple binary classification. We also provide examples on how ImaGene can be used for multiclass classification in Tutorial_multiclass.ipynb and Tutorial_continuous.ipynb.

The folder Reproduce contains all scripts used for the analyses shown in the manuscript. The folder HPC should be ignored.

Contributors (in alphabetical order)

Alice Beddis, Matteo Fumagalli, Ulas Isildak, Lucrezia Lorenzon, Luis Torada

About

License:GNU General Public License v3.0


Languages

Language:Python 48.1%Language:Jupyter Notebook 46.4%Language:Shell 5.5%