bvtsu / YeastVC

yeast variant calling as described in Johnson et al, 2021, eLife

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

YeastVC

Steps

  1. Demultiplex reads (already done by Illumina)
  • sorts sequenced reads into separate files for each sample in a sequenced run
  1. Trimming reads -create Bash script to iterate through basenames (lacking R1 or R2) and execute NGmerge NGmerge -1 sample_R1.fastq.gz -2 sample_R2.fastq.gz -o sample_merged.fastq.gz
  2. MPI-based parallelized BWA mem alignment to the W303 genome (fastq->SAM) -Output is a tab-delimited text file w/ information for each individual read and its alignment to the genome
  3. gatk MarkDuplicatesSpark to mark duplicates and sort -MarkDuplicatesSpark utilizes Apache Spark in order to parallelize the process to better take advantage all available resources
  4. samtools view (SAM->BAM) -Output is a compressed binary version of SAM. This version reduces size and to allows for indexing, which enables efficient random access of the data contained within the file.
  5. gatk base quality score recal GATK best practices workflow
  6. gatk ApplyBQSR GATK best practices workflow
  7. gatk HaplotypeCaller GATK best practices workflow
  8. Vcftools merge vcfs

About

yeast variant calling as described in Johnson et al, 2021, eLife