Rmulet / MAnorm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MAnorm - identifying differential binding in chip-seq data using linear normalization on shared peaks

This repository contains a modified version of MAnorm that addresses some compatibility issues.

For an improved version of MAnorm that runs faster and allows for 2 v 2 comparisons (replicates) using edgeR, please visit the [following repository] (https://github.com/ying-w/chipseq-compare/tree/master/MAnorm)

Original can be found here Published in Genome Biology 2012

Changes in this version

  • In Manorm.sh, the line M←log2((common_peak_count_read1+1)/(common_peak_count_read2+1)) produces an error due to a change in bedtools overage behaviour. To fix this, I swapped the input order:

coverageBed -a read1.bed -b unique_peak1.bed → coverageBed -b read1.bed -a unique_peak1.bed

  • Manorm.r requires the packages MASS, affy and R.basic, but the latter is deprecated and no longer available. Most of its functions have been transferred to R.utils and aroma.light, which can be installed as:

biocLite("aroma.light") ; install.packages(c("R.oo","R.utils","MASS"))

  • The binomial coefficient function, nChooseK, was part of 'R.basic'. It was been replaced with the built-in function 'choose'.

Problems with MAnorm (from here)

  • There is something wrong with how the p-values are calculated (see code in MAnorm2.R starting from line 50 for details).
    • pval calculation is not optimized (very slow)
      • It is faster to use choose() and run in parallel
    • Stirling approximation seems to be done incorrectly
      • This calculation is consistant in matlab version (more details in matlab MAnorm than R MAnorm)
    • pval are not symmetric (calculations from x vs y do not give the same pvalues as y vs x)
  • mergeBed command in MAnorm.sh does not actually work (need to sort first)
  • Lots of tmp files generated by MAnorm.sh and a lot of steps could be done in parallel

Pre-requisites

  1. Bedtools installed: http://bedtools.readthedocs.io/en/latest/content/installation.html
  2. Bioconductor packages installed: MASS, affy, R.utils

HOWTO: input the following lines to install the 3 previous packages

biocLite("affy")
install.packages(c("R.utils","MASS")

Usage

run command: ./MAnorm.sh sample1_peakfile[BED] sample2_peakfile[BED] sample1_readfile[BED] sample2_readfile[BED] sample1_readshift_lentgh[INT] sample2_readshift_length

MANorm requires two files: the peaks in BED format, easily retrieved from MACS, and the reads from the original SAM file in the format chromosome, start, end, strand (+/-). To obtain the latter, we can use the following:

samtools view BAM_FILE | awk -F'\t' '{if ($2==0) {print $3,$4,($4+length($10)-1),"+"} else if ($2==16) {print $3,$4,($4+length($10)-1),"-"}}

About


Languages

Language:Shell 68.7%Language:R 31.3%