Kincekara / C-BIRD

:microbe: Bacterial Identification and Antimicrobial Resistance :pill:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

C-BIRD

CT-PHL Bacterial Identification and Resistance Detection

Overview

C-BIRD is a pipeline that makes de novo assembly from Illumina paired-end reads and uses k-mer based approaches where they are available. It works on Terra.Bio platform as well as any Linux machine which has Cromwell or miniwdl workflow engines. As its name indicates, C-BIRD is designed for only rapid bacterial identification and antimicrobial resistance detection.

Purpose

The main goal of this project is to create a small, fast, and accurate workflow which can work in a cloud environment with high reproducibility and parallelization. C-BIRD uses minimalized docker containers for each pipeline step to achieve this goal. C-BIRD will be validated for a selected set of bacteria.

Scope

C-BIRD has been created with a minimalistic approach. Producing clinically meaningful results and generating individual reports for each sample is within this project's scope. Any typing (except MLST) or further analysis is out of this project's scope. However, extra tools and programs may be added for validation purposes.

Installation

Terra users can add C-BIRD to their existing workspace in Terra via Dockstore.

C-BIRD deliberately avoids auto-updates of the necessary databases for strict control and validation purposes. The following databases and files should be installed or uploaded manually. Please check wiki for detailed instructions.

File Comments
Kraken2/Bracken database Standard 8 (required)
Mash sketch custom mash sketch (required)
Adapters fasta Your sequencing adapters' list as a fasta file (optional)
Target genes fasta Extra set of genes/proteins as a fasta file containing protein sequences (optional)

Workflow

C-BIRD uses Kraken2 and Braken for taxonomic profiling of reads, which serves as a contamination check. It can be expected to have a high abundance estimation from pure isolates in general. However, there are some exceptions due to the restrictions of databases, k-mer based approaches, and highly similar organisms. Results should be interpreted considering these factors.

Mash is used to determine the identity of bacteria for selected genera with a custom mash sketch (Acinetobacter, Citrobacter, Enterobacter, Escherichia, Klebsiella, Kluyvera, Morganella, Proteus, Providencia, Pseudomonas, Raoultella, Salmonella, Serratia).

Detection of AMR genes depends on NCBI's AMRFinderPlus program and its database.

The following programs and tools are used in the C-BIRD pipeline.

Tools Version Comments
FastP 0.23.4 QC, adapter removal, quality filtering and trimming
BBTools 39.06 phiX removal & optional normalization
Kraken2 2.1.3 Taxonomic profiling & contamination check
Bracken 2.9 Abundance estimation
SPAdes 4.0.0 De novo assembly
Mash 2.3 Bacterial identification
QUAST 5.2.0 Genome assembly evaluation
BUSCO 5.7.1 Genomic data quality assessment
mlst 2.23.0 MLST typing
AMRFinderPlus 3.12.8 AMR gene identification
BLAST+ 2.15.0 Target gene search
PlasmidFinder 2.1.6 Plasmid detection
Cbird-Util 1.2 Individual summary report generation

Outputs

In addition to outputs generated in each step by the specific programs, C-BIRD creates additional summary reports in HTML for each sample.

Basic report
Advanced report
QC report

Known issues

SPAdes may fail if an authorization domain is defined for the workspace on Terra.

Additional Notes

C-BIRD includes modified and unmodified codes of Theiagen's Public Health Bacterial Genomics workflows. If you need a more sophisticated pipeline, please check Theiagen's TheiaProk workflow.

About

:microbe: Bacterial Identification and Antimicrobial Resistance :pill:

License:GNU Affero General Public License v3.0


Languages

Language:WDL 63.9%Language:HTML 33.5%Language:Python 1.4%Language:Shell 1.1%