genomic-medicine-sweden/jasen

Just Another System for Epityping using NGSs data

Jasen produces results for epidemiological and surveillance purposes. Jasen has been developed for a small set of microbiota (primarily MRSA), but will likely work with any bacteria with a stable cgMLST scheme.

Requirements

Singularity
Nextflow (curl -s https://get.nextflow.io | bash)

Development deployment (self-contained)

Copy code locally

git clone --recurse-submodules --single-branch --branch master  git@github.com:genomic-medicine-sweden/jasen.git && cd jasen

Access to OCI registries (Optional)

singularity remote login

Create singularity images.

Note: The containers that need to be built locally require sudo privileges.

cd container
sudo make build_local_containers
make download_remote_containers
cd ..

Note: The containers will be attempted to be built and/or downloaded as part of the main Makefile (that is, when running make install in the main repo folder), but building them with sudo before like above means you avoid the main script being stopped in the middle, asking you for the sudo password, when it comes to this step.

Download references and databases using singularity.

First, make sure you stand in the main jasen folder (so if you cd:ed into the container folder before, you need to cd back to the main folder with cd ..). Then run the install make rule:

make install

Finally, run checks:

make check

Any errors produced during this step will hinder pipeline execution in unexpected ways.

Configuration and test data

Config

Source: configs/nextflow.base.config

Edit the root parameter in configs/nextflow.base.config
Edit the krakenDb, workDir and outdir parameters in configs/nextflow.base.config
Edit the runOptions in configs/nextflow.base.config in order to mount directories to your run

Test data

Source: assets/test_data/samplelist.csv

Edit the read1 and read2 columns in assets/test_data/samplelist.csv

Setting up temp directories

Source: ~/.bashrc

Add the export line to ~/.bashrc
Change SINGULARITY_TMPDIR to APPTAINER_TMPDIR if you are using apptainer

export SINGULARITY_TMPDIR="/tmp" #or equivalent filepath to tmp dir

Fetching databases

Choose database

Choose between Kraken DB (64GB [Highly recommended]) or MiniKraken DB (8GB). Or customize your own.

Download Kraken database

wget -O /path/to/kraken_db/krakenstd.tar.gz https://genome-idx.s3.amazonaws.com/kraken/k2_standard_20230314.tar.gz
tar -xf /path/to/kraken_db/krakenstd.tar.gz

Download MiniKraken database

wget -O /path/to/kraken_db/krakenmini.tar.gz https://genome-idx.s3.amazonaws.com/kraken/k2_standard_08gb_20230314.tar.gz
tar -xf /path/to/kraken_db/krakenmini.tar.gz

Updating databases

Update MLST database

bash /path/to/jasen/assets/mlst_db/update_mlst_db.sh

Create personalised TBProfiler database

Install jasentool

git clone git@github.com:ryanjameskennedy/jasentool.git && cd jasentool
pip install .

Create input csv that is used as tbdb input (composed of FoHM, WHO & tbdb variants)

jasentool converge --output_dir /path/to/jasen/assets/tbdb

Create tbdb (ensure tb-profiler is installed)

cd /path/to/jasen/assets/tbdb
tb-profiler create_db --prefix converged_who_fohm_tbdb
tb-profiler load_library converged_who_fohm_tbdb

Bgzip and index gms TBProfiler db

bgzip -c converged_who_fohm_tbdb.bed > /path/to/jasen/assets/tbprofiler_dbs/bed/converged_who_fohm_tbdb.bed.gz
tabix -p bed /path/to/jasen/assets/tbprofiler_dbs/bed/converged_who_fohm_tbdb.bed.gz

Usage

Simple self-test

nextflow run main.nf -profile staphylococcus_aureus -config configs/nextflow.base.config --csv assets/test_data/samplelist.csv

Usage arguments

Argument type	Options	Required
-profile	staphylococcus_aureus/escherichia_coli	True
-entry	bacterial_default	True
-config	nextflow.base.config	True
-resume	NA	False
--output	user specified	False

Input file format

id,platform,read1,read2
p1,illumina,assets/test_data/sequencing_data/saureus_10k/saureus_large_R1_001.fastq.gz,assets/test_data/sequencing_data/saureus_10k/saureus_large_R2_001.fastq.gz

Component Breakdown

QC

Kraken2: Species detection.
Bracken: Combined with Kraken2 for species detection.
bwa mem: Maps reads to cgMLST loci (demarcated by bed file) in order to estimate genome coverage. Low levels of Intra-species contamination or erroneous mapping is removed using bwa and filtering away the heterozygous mapped bases.
interquartile range: Calculates evenness of coverage.

Assembly

SPAdes: De novo assembly for Ion Torrent.
SKESA: De novo assembly for Illumina.
QUAST: Extracts QC data (De novo assembly parameters) from the assembly.

Epidemiological typing

chewBBACA: Calculates cgMLST of extracted alleles decided by schema. Number of missing loci is calculated and used as a QC parameter.
cgmlst.net: The cgMLST reference schema.
mlst: Caculates traditional 7-locus MLST.

Supported profiles:

staphylococcus_aureus
escherichia_coli

Future profiles that will be supported:

klebsiella_pneumoniae
mycobacterium_tuberculosis

Virulence and resistance markers

resfinder: Detects antimicrobial resistance genes as well as environmental and chemical resistance genes.
pointfinder: Combines with resfinder to detect variants.
virulencefinder: Detects virulence genes.
amrfinderplus: Detects antimicrobial resistance genes as well as environmental, chemical resistance and virulence genes.
resfinder_db: Resfinder database.
pointfinder_db: Pointfinder database.
virulencefinder_db: Virulencefinder database.

Relatedness

sourmash: Determine relatedness between samples.

Report and visualisation

Bonsai: Visualises jasen outputs.
graptetree: Visualise phylogenetic relationship using cgmlst data.

Tips

Always run the latest versions of the bioinformatical software.
Verify you have execution permission for jasens *.sif images.
Old Singularity versions may sporadically produce the error FATAL: could not open image jasen/container/*.sif: image format not recognized!

genomic-medicine-sweden / jasen