jianhong / ATACseqQC

ATAC-seq Quality Control

Home Page:https://jianhong.github.io/ATACseqQC/articles/ATACseqQC.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

splitGAlignmentsByCut 'mergeBam' error

GBeattie opened this issue · comments

Hey!

Enjoying the package so far, just having difficulties with this error, details below.

The error arises when I try to split using any more than one chromosome as input to splitGAlignmentsByCut;

files will be output into outPath

outPath <- "splited"
dir.create(outPath)
## shift the coordinates of 5'ends of alignments in the bam file
library(BSgenome.Mmusculus.UCSC.mm10)
library(TxDb.Mmusculus.UCSC.mm10.knownGene)
which <- as(seqinfo(Mmusculus), "GRanges")[1:2]
gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE)
gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE, flag = scanBamFlag(isSecondaryAlignment = FALSE, isUnmappedQuery = FALSE, isNotPassingQualityControls = FALSE, isSupplementaryAlignment = FALSE))
shiftedBamfile <- file.path(outPath, "shifted.bam")
gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile)
txs <- transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene)
genome <- Mmusculus
objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=genome, outPath = outPath)

Error:

Error in value[[3L]](cond) : 
  'mergeBam' 'destination' exists, 'overwrite' is FALSE
  destination: splited/NucleosomeFree.bam

There are still split BAMs in the output directory, however they only contain the 2nd chromosome in this case.

If I include more chromosomes (1-X,Y), it seems to output only the last chromosome in the list (along with same error).

Just the first chromosome (i.e. which <- as(seqinfo(Mmusculus), "GRanges")[1] results in no error, but hoping for all chromosomes in output.

I could try manually setting the mergeBam to overwrite = TRUE, but not sure this is the correct approach.

Thanks for the response, I tried that, same result, maybe I'm missing something? The splitGAlignmentsByCut still produces outputs in the new directory, but again only the one chromosome.

> list.dirs()
[1] "."
> outPath <- "splited"
> dir.create(outPath)
> ## shift the coordinates of 5'ends of alignments in the bam file
> library(BSgenome.Mmusculus.UCSC.mm10)
> library(TxDb.Mmusculus.UCSC.mm10.knownGene)
> which <- as(seqinfo(Mmusculus), "GRanges")[1:21]
> gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE)
> gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE, flag = scanBamFlag(isSecondaryAlignment = FALSE, isUnmappedQuery = FALSE, isNotPassingQualityControls = FALSE, isSupplementaryAlignment = FALSE))
> shiftedBamfile <- file.path(outPath, "shifted.bam")
> gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile)
> txs <- transcripts(TxDb.Mmusculus.UCSC.mm10.knownGene)
> genome <- Mmusculus
> #objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=genome)
> list.dirs()
[1] "."         "./splited"
> outPath <- "splited_2"
> dir.create(outPath)
> objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=genome, outPath = outPath)
Error in value[[3L]](cond) : 
  'mergeBam' 'destination' exists, 'overwrite' is FALSE
  destination: splited_2/NucleosomeFree.bam

Sorry for the late response, installed from github and looks to be working fine now, thank you very much for the fix!

Hi,
I know this is an old error, but I have the same problem. I'm using ATACseqQC 1.8.5 and R 3.6.1, and I get the same error when I run more than one chromosome. It works with no problems for one chromosome but not for multiple chromosomes. Any help is appreciated.
the error message
Error in value[3L] :
'mergeBam' 'destination' exists, 'overwrite' is FALSE
destination: splitBam/NucleosomeFree.bam

my cod:

load the library

rm(list=ls())
library(ChIPpeakAnno)
library(ATACseqQC)
library(GenomicRanges)
library(EnsDb.Hsapiens.v75)
library(AnnotationDbi)
library(Rsamtools)
library(BSgenome.TroutArrlyShort.NCBI.PRJNA623027)

setwd("/localstorage/ATACseq/result/")
trout_txdb <-makeTxDbFromGFF("/localstorage/TroutNewGenome/OmyArlee_1_1/OmyArrlyShort/GCF_013265735.2_USDA_OmykA_1.1_genomic_Short.gff")
saveDb(trout_txdb, file="Trout.sqlite")
trout_Annota <- loadDb("Trout.sqlite")
txs <- transcripts(trout_txdb)

read bam file

bamfile <- ("/localstorage/ATACseq/fastq_Adapter_cut/alignments_ARl/F1B_Arlsorted_RMmitDup_short.bam")
bamfile.labels <- gsub(".bam", "", basename(bamfile))
#bam file status
#bamQC(bamfileT, outPath = NULL)

generate fragement size distribution

fragSize <- fragSizeDist(bamfile, bamfile.labels)

#bamfile tags to be read in
possibleTag <- combn(LETTERS, 2)
possibleTag <- c(paste0(possibleTag[1, ], possibleTag[2, ]),
paste0(possibleTag[2, ], possibleTag[1, ]))

bamTop100 <- scanBam(BamFile(bamfile, yieldSize = 100),
param = ScanBamParam(tag=unlist(possibleTag)))[[1]]$tag
tags <- names(bamTop100)[lengths(bamTop100)>0]
tags
tags <- tags[tags!="PG"]

outPath <- "splitBam"
if (dir.exists(outPath))
{
unlink(outPath, recursive = TRUE, force = TRUE)
}
dir.create(outPath)

seqlev <-(c( "NC_048565.1","NC_048566.1"))
#,"NC_048567.1","NC_048568.1","NC_048569.1","NC_048570.1","NC_048571.1",

"NC_048572.1","NC_048573.1","NC_048574.1","NC_048575.1","NC_048576.1","NC_048577.1","NC_048578.1",

"NC_048579.1","NC_048580.1","NC_048581.1","NC_048582.1","NC_048583.1","NC_048584.1","NC_048585.1",

#"NC_048586.1","NC_048587.1","NC_048588.1","NC_048589.1","NC_048590.1","NC_048591.1","NC_048592.1",

"NC_050570.1","NC_050571.1","NC_050572.1","NC_048593.1"))

#which <- as(seqinfo(RainbowTrout), "GRanges")
which <- as(seqinfo(RainbowTroutArrly)[seqlev], "GRanges")
gal <- readBamFile(bamfile, tag=tags, which=which, asMates=TRUE, bigFile=TRUE)

shiftedBamfile <- file.path(outPath, "shifted.bam")
gal1 <- shiftGAlignmentsList(gal, outbam=shiftedBamfile)
objs <- splitGAlignmentsByCut(gal1, txs=txs, genome=RainbowTroutArrly, outPath=outPath )
dir(outPath)