hcstubbe / docker_seq

Dockerfiles for sequencing data pipeline

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DOI

docker_seq

Dockerfiles for sequencing data pipeline

Docker images

  • guppy_gpu: base calling of ONT minion raw data with gpu acceleration
  • bonito_gpu: base calling ONT minion raw data with gpu acceleration using the bonito base caller.
  • c3poa: consensus calling of r2c2 reads
  • PyIR: align B and T cell receptor sequences using IgBLAST implemented by PyIR
  • longread_stringtie: pipeline for analyzing long reads using stringtie
  • ballgown: plotting and differential expression analysis of stringtie results using ballgown

How to run the pipeline

  • Run the docker containers on the data folder in the following order: (1) guppy_gpu OR bonito_gpu, (2, if you used the R2C2 pipeline) c3poa, (3) PyIR, (4) longread_stringtie, and (5) ballgown. The images need to be mounted on the sample's path (e.g. ~/path/to/sample/):
    • The data needs to be base called using the guppy_gpu OR bonito_gpu (this is separated for basecalling on a gpu cluster; see documentation here for bonito or here for guppy)
    • On a SLURM managed cluster after installing the images as charliecloud images (see below) and after basecalling:

Rationale for using docker

  • Ease of use: dependencies are installed automatically when building the image; after building and testing, the image can be moved between machines/servers
  • Reproducibility: once the image is build, behaviour is stable across machines/servers; behaviour does not change when using the image later
  • Scalabiliy: test on a local machine/laptop, run on a workstation/high performance computing server

Using charliecloud

Convert docker image c3poa and export to tar using charliecloud.

ch-docker2tar [DOCKER IMAGE] ~/

Untar image.

ch-tar2dir [CHARLIE CLOUD IMAGE].tar.gz /path/to/destination

Run the image on the server using charliecloud.

ch-run -w /path/to/destination/[IMAGE] -b ~/path/to/data/ -- sh [SOME SCRIPT].sh

Submit image to SLURM manager.

sbatch /path/to/script/[SOME SLURM SCRIPT].cmd

Inspect SLURM queue.

squeue --clusters=<cluster name>

Inspect SLURM run by id.

scontrol --clusters=<cluster name> show jobid=<job id>

About

Dockerfiles for sequencing data pipeline

License:MIT License


Languages

Language:Shell 82.6%Language:Dockerfile 7.4%Language:Python 5.6%Language:R 4.5%