filcfig / PCP

PrecisionCallerPipeline (PCP) automatically takes Precision ID mtDNA Whole Genome Panel (Thermo Fisher Scientific, USA) FASTQ files and outputs BAM files correctly aligned to the rCRS.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The PrecisionCallerPipeline (PCP)

The PCP pipeline automatically takes the FASTQ files from a sequencing facility using the Precision ID mtDNA Whole Genome Panel (Thermo Fisher Scientific, USA) and outputs fully aligned BAM files mapped to the commonly-used reference sequence rCRS.

Prerequisites

We use a workflow based on Snakemake in a Linux-based system with:

  • Awk, for SAM file editing;
  • BEDTools, for BAM to FASTQ conversion;
  • BWA-MEM, for read alignment;
  • Pycision, for amplicon delimitation and selection;
  • RtN!, for NUMT removal;
  • SAMtools, for BAM conversion, sorting, indexing, and merging;
  • Trimmomatic, for read quality control and trimming.

Installation

Install the software above and clone this repo to your directory of choice:

git clone https://github.com/filcfig/PCP.git

Add pycision.py, trimmomatic-0.39.jar, and the RtN folder (don't forget to perform bunzip2 humans.fa.bz2 && bwa index humans.fa) to the tools folder.

Usage

Start by adding the FASTQ files to the sequencing/selected_fastqfiles folder. Then, make run_FASTQ.sh executable and run it (make sure Snakemake is activated - if you use conda, type conda activate snakemake):

chmod +x run_FASTQ.sh
./run_FASTQ.sh

Since running RtN requires some time per sample and a good amount of RAM, it is possible to run FASTQ files without RtN, by running Snakefile_noRtN instead:

snakemake -s Snakefile_noRtN -j

The final BAM files will be available at the sequencing/merged folder.

Data

The data generated with samples previously sequenced within the 1000 Genomes Project are openly available in Zenodo.

Citation

Our manuscript is published at:

Cortes-Figueiredo, F.; Carvalho, F.S.; Fonseca, A.C.; Paul, F.; Ferro, J.M.; Schönherr, S.; Weissensteiner, H.; Morais, V.A. From Forensics to Clinical Research: Expanding the Variant Calling Pipeline for the Precision ID mtDNA Whole Genome Panel. Int. J. Mol. Sci. 2021, 22, 12031. https://doi.org/10.3390/ijms222112031.

License

Distributed under the MIT License. See LICENSE for more information.

About

PrecisionCallerPipeline (PCP) automatically takes Precision ID mtDNA Whole Genome Panel (Thermo Fisher Scientific, USA) FASTQ files and outputs BAM files correctly aligned to the rCRS.

License:MIT License


Languages

Language:Shell 100.0%