akdess / BAFExtract

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BAFExtract

BAFExtract generates the B-allele frequency shifts from mapped reads alone without the need for a heterozygous variant call set. BAFExtract a part of CaSpER (https://github.com/akdess/CaSpER)

There are two steps:

  1. Generation of the pileup.
  2. Generation of the B-allele frequency profile.

Please open a new "issue" on github or contact authors Akdes.Harmanci@uth.tmc.edu or arif.o.harmanci@uth.tmc.edu for questions.

Installation

Type make to build BAFExtract. The executable is built under bin directory. The code is tested on various Unix based systems.

Usage

Extract BAF values from RNA-Seq bam files

	samtools view <bam_file> | ./BAFExtract -generate_compressed_pileup_per_SAM stdin <genome_list> <sample_dir> 50 0; ./BAFExtract -get_SNVs_per_pileup  <genome_list> <sample_dir> <genome_fasta_pileup_dir> 20 4 0.1 <output_baf_file>

<sample_dir>: the name of sample directory <output_baf_file>: final output

You can download and unzip genome_fasta_pileup_dir files from :

for hg38

for hg19

Or you can create genome_fasta_pileup_dir files for other genomes using the following commands:

BAFExtract -preprocess_FASTA [FASTA file path] [Output directory]

for example:

wget -c http://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/chromFa.tar.gz
tar -xvzf chromFa.tar.gz
mkdir ../mm10
FILES=./*fa
for f in $FILES
do
  echo "Processing $f file..."
  BAFExtract -preprocess_FASTA $f ../mm10
done

You can download genome_list files from :

for hg38

generated using the following command:

fetchChromSizes hg38 > hg38.list

for hg19

generated using the following command:

fetchChromSizes hg19 > hg19.list

Example

download example bam file

mkdir test; samtools view SRR1295366.sorted.bam | ./bin/BAFExtract -generate_compressed_pileup_per_SAM stdin hg38.list test 50 0; ./bin/BAFExtract -get_SNVs_per_pileup hg38.list test ./hg38/ 20 4 0.1 test.snp

Note: In the example above ./BAFExtract -generate_compressed_pileup_per_SAM uses mimunum mapping quality threshold 50. Depending on the aligner you used the MAPQ-Values can differ a lot and the setting of [Minimum mapping quality] to 50 could mean that no reads are surviving the filtering, only due to the aligners implementation of MAPQ. (acknowledgements to Tobias Tekath)

About


Languages

Language:C++ 97.9%Language:C 1.8%Language:Makefile 0.3%