realityjunky / MrBam

Query Mutated Reads from a Bam

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MrBam

Build Status Coverage Status

For a given mutation, query its mutated reads from a BAM, merge the reads by positions and give the unique count.

Prerequisites

  • Python 3.4+
  • Pysam ($pip install pysam)

Usage

$ python -m MrBam.main --help
usage: main.py [-h] [-c CFDNA] [-g GDNA] [-o OUTPUT] [-i INFO] [-q QUAL] [-s]
               [-f] [-v] query

example:
  $ MrBam sample.vcf --cfdna sample_cfdna.bam -o sample_MrBam.vcf --simple

positional arguments:
  query                vcf file contains mutations to query

optional arguments:
  -h, --help           show this help message and exit
  -c, --cfdna CFDNA    bam file contains cfdna reads info. There must be a
                       corresponding .bai file in the same directory
  -g, --gdna GDNA      bam file contains gdna reads info. There must be a
                       corresponding .bai file in the same directory
  -o, --output OUTPUT  output vcf file. Will be overwritten if already exists
  -i, --info INFO      additional infomations about these position
  -q, --qual QUAL      drop bases whose qulity is less than this (default: 20)
  -s, --simple         annotate less infomations into vcf output
  -f, --fast           do not infer origin read size by CIGAR, it can be
                       faster and consume less memory.
  -v, --verbos         output debug info

Performace

#sample  option    bam_size(mb)  vcf_lines  CPU_time(s)  Memory(mb)
Sam3     (default) 194           14978      147          1116
Sam3     --fast    194           14978      129          27
Sam2     (default) 655           33702      500          3162
Sam2     --fast    655           33702      417          28
Sam1     (default) 1620          113066     5952         8377
Sam1     --fast    1620          113066     5785         34
Sam4     (default) 2338          648336     49067        9912
Sam4     --fast    2338          648336     60393        36
  • CPU_time is user + sys
  • Memory may vary accroding to system memory pressure
  • Test on Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

Under development

About

Query Mutated Reads from a Bam

License:MIT License


Languages

Language:Python 100.0%