genomic-medicine-sweden / jasen

Bacterial typing pipeline for clinical NGS data. Written in NextFlow, Python & Bash.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Just Another System for Epityping using NGSs data

Jasen produces results for epidemiological and surveillance purposes. Jasen has been developed for a small set of microbiota (primarily MRSA), but will likely work with any bacteria with a stable cgMLST scheme.

Requirements

  • Singularity
  • Nextflow (curl -s https://get.nextflow.io | bash)

Recommended

  • Conda
  • Singularity Remote Login

Development deployment (self-contained)

Copy code locally

git clone --recurse-submodules --single-branch --branch master  git@github.com:genomic-medicine-sweden/jasen.git && cd jasen

Access to OCI registries (Optional)

singularity remote login

Create singularity images.

Note: The containers that need to be built locally require sudo privileges.

cd container
sudo make build_local_containers
make download_remote_containers
cd ..

Note: The containers will be attempted to be built and/or downloaded as part of the main Makefile (that is, when running make install in the main repo folder), but building them with sudo before like above means you avoid the main script being stopped in the middle, asking you for the sudo password, when it comes to this step.

Download references and databases using singularity.

First, make sure you stand in the main jasen folder (so if you cd:ed into the container folder before, you need to cd back to the main folder with cd ..). Then run the install make rule:

make install

Finally, run checks:

make check

Any errors produced during this step will hinder pipeline execution in unexpected ways.

Configuration and test data

Config

Source: configs/nextflow.base.config

  • Edit the root parameter in configs/nextflow.base.config
  • Edit the krakenDb, workDir and outdir parameters in configs/nextflow.base.config
  • Edit the runOptions in configs/nextflow.base.config in order to mount directories to your run

Test data

Source: assets/test_data/samplelist.csv

  • Edit the read1 and read2 columns in assets/test_data/samplelist.csv

Setting up temp directories

Source: ~/.bashrc

  • Add the export line to ~/.bashrc
  • Change SINGULARITY_TMPDIR to APPTAINER_TMPDIR if you are using apptainer
export SINGULARITY_TMPDIR="/tmp" #or equivalent filepath to tmp dir

Fetching databases

Choose database

Choose between Kraken DB (64GB [Highly recommended]) or MiniKraken DB (8GB). Or customize your own.

Download Kraken database

wget -O /path/to/kraken_db/krakenstd.tar.gz https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20230314.tar.gz
tar -xf /path/to/kraken_db/krakenstd.tar.gz

Download MiniKraken database

wget -O /path/to/kraken_db/krakenmini.tar.gz https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20230314.tar.gz
tar -xf /path/to/kraken_db/krakenmini.tar.gz

Updating databases

Update MLST database

bash /path/to/jasen/assets/mlst_db/update_mlst_db.sh

Create personalised TBProfiler database

Install jasentool

git clone git@github.com:ryanjameskennedy/jasentool.git && cd jasentool
pip install .

Create input csv that is used as tbdb input (composed of FoHM, WHO & tbdb variants)

jasentool converge --output_dir /path/to/jasen/assets/tbdb

Create tbdb (ensure tb-profiler is installed)

cd /path/to/jasen/assets/tbdb
tb-profiler create_db --prefix converged_who_fohm_tbdb
tb-profiler load_library converged_who_fohm_tbdb

Bgzip and index gms TBProfiler db

bgzip -c converged_who_fohm_tbdb.bed > /path/to/jasen/assets/tbprofiler_dbs/bed/converged_who_fohm_tbdb.bed.gz
tabix -p bed /path/to/jasen/assets/tbprofiler_dbs/bed/converged_who_fohm_tbdb.bed.gz

Usage

Simple self-test

nextflow run main.nf -profile staphylococcus_aureus -config configs/nextflow.base.config --csv assets/test_data/samplelist.csv

Usage arguments

Argument type Options Required
-profile staphylococcus_aureus/escherichia_coli True
-entry bacterial_default True
-config nextflow.base.config True
-resume NA False
--output user specified False

Input file format

id,platform,read1,read2
p1,illumina,assets/test_data/sequencing_data/saureus_10k/saureus_large_R1_001.fastq.gz,assets/test_data/sequencing_data/saureus_10k/saureus_large_R2_001.fastq.gz

Component Breakdown

QC

  • Kraken2: Species detection.
  • Bracken: Combined with Kraken2 for species detection.
  • bwa mem: Maps reads to cgMLST loci (demarcated by bed file) in order to estimate genome coverage. Low levels of Intra-species contamination or erroneous mapping is removed using bwa and filtering away the heterozygous mapped bases.
  • interquartile range: Calculates evenness of coverage.

Assembly

  • SPAdes: De novo assembly for Ion Torrent.
  • SKESA: De novo assembly for Illumina.
  • QUAST: Extracts QC data (De novo assembly parameters) from the assembly.

Epidemiological typing

  • chewBBACA: Calculates cgMLST of extracted alleles decided by schema. Number of missing loci is calculated and used as a QC parameter.
  • cgmlst.net: The cgMLST reference schema.
  • mlst: Caculates traditional 7-locus MLST.

Supported profiles:

  • staphylococcus_aureus
  • escherichia_coli

Future profiles that will be supported:

  • klebsiella_pneumoniae
  • mycobacterium_tuberculosis

Virulence and resistance markers

Relatedness

  • sourmash: Determine relatedness between samples.

Report and visualisation

  • Bonsai: Visualises jasen outputs.
  • graptetree: Visualise phylogenetic relationship using cgmlst data.

Tips

  • Always run the latest versions of the bioinformatical software.
  • Verify you have execution permission for jasens *.sif images.
  • Old Singularity versions may sporadically produce the error FATAL: could not open image jasen/container/*.sif: image format not recognized!

About

Bacterial typing pipeline for clinical NGS data. Written in NextFlow, Python & Bash.

License:GNU General Public License v3.0


Languages

Language:Nextflow 52.3%Language:Makefile 18.7%Language:Perl 18.6%Language:Python 9.2%Language:Shell 1.3%