MrTomRod / Bacterial_genome_assembly

Bacterial genome assembly

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bacterial genome assembly pipeline

This is a fork of danielwuethrich87/Bacterial_genome_assembly, optimized for SLURM by Simone Oberhansli.

This pipeline assembles Illumina paired end reads. It results in a scaffold and annotated assembly.

Steps:

  • Read trimming
  • SPades de novo assembly
  • Coverage selection (exclusion of scaffold with low coverage)
  • Prokka annotation

Requirements:

  • Linux 64 bit system
  • Slurm (tested on version 17.11.7)
  • Vital-IT modules
  • python (version 2.7)
  • SPAdes (version 3.10.1)
  • samtools (version 1.3)
  • prokka (version 1.12)
  • bowtie2 (version 2.3.0)
  • pilon (version 1.2.2, already installed in /software)
  • barrnap (version 0.9, already installed in /software)
  • trimmomatic (version 0.36, already installed in /software)

Installation:

git clone https://github.com/MrTomRod/Bacterial_genome_assembly.git

Usage:

Place the two input files in /input.

run_bacteria_assembly.sh is a SLURM batch script that runs bacteria_assembly.sh (as you might have guessed). In most cases, everything you have to do is edit the line that begins with $DIR/bacteria_assembly_slurm.sh according to your needs.

$DIR/bacteria_assembly_slurm.sh <Sample_ID> <Reads_R1> <Reads_R2> <Genus_> <species_> <Number_of_cores>

<Sample_ID>               Unique identifier for the sample
<Reads_R1>                Foreward read file
<Reads_R2>                Reversed read file
<Genus_>                  Genus name of the bacterial species
<species_>                Species name of the bacterial species
<Number_of_cores>         number of parallel threads to run (int)

A test run of one genome (each input file was ~70M) took about 25 minutes on the IBU cluster. Obviously, if you're planning to assemble multiple genomes, the total run time will be longer. Keep in mind to change the batch file's maximum run time accordingly! (#SBATCH --time=??)

Now simply disbatch the script to SLURM with sbatch run_bacteria_assembly.sh.

About

Bacterial genome assembly


Languages

Language:Perl 45.3%Language:Shell 41.2%Language:Python 13.5%