ys4 / strainest

StrainEst - abundance estimation of strains

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

StrainEst

StrainEst is a novel, reference-based method that uses the Single Nucleotide Variants (SNV) profiles of the available genomes of selected species to determine the number and identity of coexisting strains and their relative abundances in mixed metagenomic samples.

Using Docker (that is, on MS Windows, Mac OS X and Linux!)

The easiest way to run StrainEst is through Docker. Docker works similarly to a virtual machine image, providing a container in which all the software has already been installed, configured and tested.

  1. Install Docker for Linux, Mac OS X or Windows.

  2. Download the latest version of StrainEst:

    docker pull compmetagen/strainest
  3. Run an instance of the image, mounting the host working directory (e.g. /Users/davide/strainest) on to the container working directory /strainest:

    docker run --rm -t -i -v /Users/davide/strainest:/strainest -w /strainest compmetagen/strainest /bin/bash

    You need to write something like -v //c/Users/davide/strainest:/strainest if you are in Windows or -v /home/davide/micca:/micca in Linux. The --rm option automatically removes the container when it exits.

  4. Now you can use strainest:

    root@68f6784e1101:/micca# strainest --help

Sickle, Bowtie2 and samtools are preinstalled in the Docker image.

Install from sources on Ubuntu >= 12.04 and Debian >=7

We suggest to install the following packages through the package manager:

sudo apt-get update
sudo apt-get install build-essential \
    pkg-config \
    python2.7 \
    python-dev \
    python-pip \
    python-numpy \
    python-scipy \
    python-matplotlib \
    gcc \
    gfortran \
    libblas-dev \
    liblapack-dev \
    libfreetype6 libfreetype6-dev \
    libpng-dev \
    liblzma-dev \
    libbz2-dev

Then, upgrade pip and install the following packages:

sudo pip install --upgrade pip
pip install 'Click>=5.1' 'pandas' 'pysam>=0.9' 'scikit-learn>=0.16.1,<0.20' 'biopython>=1.50'

Download the latest version from https://github.com/compmetagen/strainest/releases and complete the installation:

tar -zxvf strainest-X.Y.Z.tar.gz
cd strainest-X.Y.Z
sudo python setup.py install

Usage

Predict strain profiles

This tutorial requires Sickle (https://github.com/najoshi/sickle), Bowtie2 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) and samtools (http://samtools.sourceforge.net/) to be installed in your system.

Download the example data (Illumina paired-end reads):

wget ftp://ftp.fmach.it/metagenomics/strainest/example/reads.tar.gz
tar zxvf reads.tar.gz

Now the raw reads will be quality trimmed (e.g. using sickle):

sickle pe -f reads1.fastq -r reads2.fastq -t sanger -o \
    reads1.trim.fastq -p reads2.trim.fastq -s reads.singles.fastq -q 20

Given the species of interest (e.g. P. acnes), download and untar the precomputed Bowtie reference database available at ftp://ftp.fmach.it/metagenomics/strainest/ref/ (e.g. pacnes.tar.gz):

wget ftp://ftp.fmach.it/metagenomics/strainest/ref/pacnes.tar.gz
tar zxvf pacnes.tar.gz

The Bowtie2 database is available in the P_acnes/bowtie directory. At this point we can align the metagenome against the database:

bowtie2 --very-fast --no-unal -x P_acnes/bowtie/align -1 reads1.trim.fastq \
    -2 reads2.trim.fastq -S reads.sam

Now we can sort and index the BAM file:

samtools view -b reads.sam > reads.bam
samtools sort reads.bam -o reads.sorted.bam
samtools index reads.sorted.bam

Finally, run the strainest est command to predict the strain abundances:

strainest est P_acnes/snp_clust.dgrp reads.sorted.bam outputdir

In the output directory we can find:

abund.txt
the predicted abundances for each reference genome;
max_ident.txt
for each reference genome, the percentage of alleles that are present in the metagenome;
info.txt
information about the prediction, including the prediction Pearson R;
counts.txt
number of counts for each SNV position/base pairs;
mse.pdf
Lasso cross-validation plot as a function of the shrinkage coefficient.

(Optional) Build a custom reference SNV profile

See the Methods section of the paper.

About

StrainEst - abundance estimation of strains

License:GNU General Public License v3.0


Languages

Language:C++ 39.6%Language:C 22.9%Language:HTML 18.7%Language:Perl 13.5%Language:Python 2.6%Language:TeX 1.1%Language:Makefile 1.0%Language:Perl 6 0.4%Language:Shell 0.3%Language:Awk 0.1%Language:Gnuplot 0.0%