cbcrg/kallisto-nf-reproduce

#kallisto-nf-reproduce This repository contains the software, scripts and data to reproduce the RNA-Seq results decribed in the Nextflow publication.

The repository contains two versions of a tradtional bash style pipeline for Mac and Linux (kallisto-mac and kallisto-linux) as well as the Nextflow version of the pipeline compatible across platforms (kallisto-nf).

Folder structure

Folder R: contains the analysis.R script for determining the overlapping sets
Folder kallisto-linux contains the scripts for running the native (bash), non-Nextflow verion of the pipeline on Linux
Folder kallisto-mac contains the scripts for running the native (bash), non-Nextflow verion of the pipeline on Mac OSX
Folder kallisto-nf contains the Nextflow version of the pipeline for running on any compatible platform

How to replicate result

Clone the repository

kallisto-nf exisits as a git submodule within this repository. To clone the repository, including the submodule, one can include the --recursive flag:

git clone --recursive https://github.com/cbcrg/kallisto-nf-reproduce.git
cd kallisto-nf-reproduce

Data

All data is available from the original sources, as well as a compressed tarball (~22GB).

To download and uncompress the data use the following command:

mkdir data
wget -O- https://zenodo.org/record/159158/files/kallisto_data.tar.gz | tar xz -C data

Original Sources

If you wish to retrieve the data from the original sources, you can find it here:

Reads: All Illumina HiSeq2000 read data can be downloaded from the NCBI SRA GEO: GSE37703.
Transcriptome: The transcriptome GRCh38 release 79 (cDNA all) is available from the kallisto website here.

Native Linux

Install Kallisto version 0.42.4.

Install Sleuth

Launch the kallisto bash pipeline script running the following command:

./kallisto-linux/kallisto-std.sh \
    data/raw_reads \
	data/transcriptome/Homo_sapiens.GRCh38.rel79.cdna.all.fa  \
  	data/exp_info/hiseq_info.txt \
  	results-linux

Native Mac

Install Kallisto version 0.42.4.

Install Sleuth

Launch the kallisto bash pipeline script running the following command:

./kallisto-mac/kallisto-std.sh \
    data/raw_reads \
    data/transcriptome/Homo_sapiens.GRCh38.rel79.cdna.all.fa  \
    data/exp_info/hiseq_info.txt \
    results-mac

Nextflow (Mac & Linux)

Install Nextflow with the following command:

curl -fsSL get.nextflow.io | bash

Install Docker following the instruction at this page.

Pull the Docker images used for this experiment (optional):

docker pull cbcrg/kallisto-nf@sha256:9f840127392d04c9f8e39cb72bcd62ff53cfe0492dde02dc3749bf15f1c547f1

Once the read data has been downloaded from SRA, it is possible to reproduce the Nextflow version of the pipeline from the kallisto-nf directory using the following command:

nextflow run kallisto-nf/kallisto.nf \
    --reads 'data/raw_reads/SRR4933*_{1,2}.fastq' \
    --transcriptome data/transcriptome/Homo_sapiens.GRCh38.rel79.cdna.all.fa \
    --experiment data/exp_info/hiseq_info.txt \
    --output kallisto-nf-results \
    -with-docker

cbcrg / kallisto-nf-reproduce