brandonjew / fastq-prep

prepares sequencing data for parallelized downstream analysis

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fastq-prep

Prepares sequencing data for parallelized variant calling. Takes BAM, SAM, CRAM, or FASTQ files with paired-end reads and produces split and compressed FASTQ files. Corresponding reads occur on same line in respective FASTQ file.

Currently filters reads with SAM flags indicating non primary alignment and quality control failure. Also filters reads with SAM flags indicating "first in pair" or "second in pair" when such a read has already been encountered (i.e. Two reads have the same name and are both reported as first in pair). Filtering options are specified in the header of the python file.

usage

  • python fastq_prep.py [output.prefix] [input.bam or input.fq]

  • OR python fastq_prep.py [output.prefix] [input_R1.fq,input_R2.fq]

  • OR [write SAM records to stdout] | python fastq_prep.py [output.prefix]

  • Multiple input files are separated by commas, no spaces

requirements

  • pysam

todo

  • Support generation of interleaved FASTQ chunks.

About

prepares sequencing data for parallelized downstream analysis


Languages

Language:Python 100.0%