LTLA / csaw

Clone of the Bioconductor repository for the csaw package.

Home Page:https://bioconductor.org/packages/devel/bioc/html/csaw.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

filterWindowsGlobal() function running is extremely long

RacheliRap opened this issue · comments

Hi,
I'm trying for the first time to use the package. I want to analysis H3k27me3 CHIP-Seq data, and therefore I use the same parameters shown in the tutorial for this histone modification.
When I run the function filterWindowsGlobal for filtering of low-abundance windows, it takes more than 12 hours. Any idea way? my bam file contain ~5 million reads each.
My code:

library(BiocFileCache)
library(csaw)


## read the metadata file
h3k27me3data <- 
  read.csv("../CSAW_df.csv")
h3k27me3data <- data.frame(apply(h3k27me3data, 2, as.character), stringsAsFactors = F)

## define the blacklist using the predicted repeats from the RepeatMasker software
bfc <- BiocFileCache("local", ask=FALSE)
black.path <- bfcrpath(bfc, file.path("http://hgdownload.cse.ucsc.edu",
                                      "goldenPath/mm10/bigZips/chromOut.tar.gz"))
tmpdir <- tempfile()
dir.create(tmpdir)
untar(black.path, exdir=tmpdir)

## Iterate through all chromosomes.
collected <- list()
for (x in list.files(tmpdir, full=TRUE)) {
  f <- list.files(x, full=TRUE, pattern=".fa.out")
  to.get <- vector("list", 15)
  to.get[[5]] <- "character"
  to.get[6:7] <- "integer"
  collected[[length(collected)+1]] <- read.table(f, skip=3, 
                                                 stringsAsFactors=FALSE, colClasses=to.get)
}

collected <- do.call(rbind, collected)
blacklist <- GRanges(collected[,1], IRanges(collected[,2], collected[,3]))
blacklist

## minimum mapping quality score to 10. 
## We also restrict ourselves to the standard chromosomes
param <- readParam(minq=10, discard=blacklist,
                   restrict=paste0("chr", c(1:19, "X", "Y")))

## Counting reads into windows
win.data <- windowCounts(h3k27me3data$Path, param=param, width=2000,
                         spacing=500, ext=200)

bins <- windowCounts(h3k27me3data$Path, bin=TRUE, width=10000, param=param)
filter.stat <- filterWindowsGlobal(win.data, bins)

Thank you!!

I don't know. There is nothing particularly complex inside the function that would cause it to behave that way. Perhaps you could debug(filterWindowsGlobal) and step through the function (press Enter to go through line-by-line) and identify the step that stalls.