snlace

"Necklace" pipeline for creation of custom transcriptomes from RNAseq datasets, implemented in snakemake. Pipeline first published by Davidson et al. 2017 in Genome Biology. Tools used by the pipeline not included. I reccommend to use this either with the corresponding dockerfile: https://hub.docker.com/r/sarahhp/snlace/, or in another type of virtual environment (e.g. conda).

Pipeline consists of two snakemake workflows. One for preprocessing of the raw data (quality control, trimming and optional subsetting) stored in the trim_subsample.py file. The second runs the full necklace pipeline on processed data (snecklace/Snakefile), including some final summary statistics. If preprocessing is not required only run the second snakemake command in the commands.sh file.

Quick install and run on a demo dataset:

Clone this repository (not necessary if using the docker image)

git clone https://github.com/sarahhp/snlace.git
cd snlace

Create a conda environment with the necessary packages (not necessary if using the docker image)

conda env create -f snecklace/envs/nlace.yml --name nlace
conda activate nlace

Download demo data

wget https://github.com/Oshlack/necklace/wiki/asserts/sheep_small_demo_data.tar.gz
tar -C ../data -zxvf sheep_small_demo_data.tar.gz

Install extra tools (not necessary if using the docker image)

g++ -o tools/cluster tools/cluster.c
g++ -o tools/gtf2flatgtf tools/gtf2flatgtf.c
g++ -o tools/make_blocks tools/make_blocks.c

Download pblat from http://icebert.github.io/pblat/ and run make in the main dir.

Dryrun: bash commands.sh OR snakemake -s snecklace/Snakefile -d .. --configfile sheep-demo.json --use-conda -npr

To run, remove -n option from uncommented line in commands.sh OR run snakemake -s snecklace/Snakefile -d .. --configfile sheep-demo.json --use-conda -pr

About

MIT License

Languages

Language:Python 73.2%Language:C 22.4%Language:Shell 2.7%Language:C++ 1.7%