jianhong / ATACseqQC

ATAC-seq Quality Control

Home Page:https://jianhong.github.io/ATACseqQC/articles/ATACseqQC.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Optimizing Time and Memory Usage for Large BAM Files

FarzanehRah opened this issue · comments

Hi,
I've been trying to use ATACseqQC for quality control of some plant BAM files (exceeding 100GB in size). However, I encountered difficulties when attempting to run it, even after allocating a large amount of memory in our HPC system. I am looking for ways to optimize both time and memory usage, especially when dealing with significantly large BAM files. I couldn't find any option, for example, for the shiftGAlignmentsList function.

Many thanks in advance for your assistance.

I suppose you are using bigFile=TRUE when youreadBamFile.
You can try to split the BAM file into small one and shiftGAlignmentsList for the small bams and them merge them after running.
To split bam file, please refer samtools view

Hi, thank you for your reply. Yes, I am using bigFile=TRUE. I applied the shiftGAlignmentsList function to one replicate of my samples. However, I encountered issues in the subsequent steps as well. I am wondering if there is a possibility to parallelize these functions. Even with 2T memory allocation, I am getting an error: long vectors not supported yet: ../../../include/Rinlinedfuns.h:537 Execution halted. Thanks

Could you please first try to split the bam file into smaller one such as just contain the chr1 reads by samtools view -bho chr1.bam input.bam chr1

Actually, I have the results for one chromosome, but the researcher prefers to have information for the entire genome since the results are not interpretable in isolation,
Thanks again.
test_PT_score

Simply do one by one and them merge the splited bam files.

So, at which point should I merge them? For instance, to generate a PTscore plot of the entire genome, I require a gal1 object of all chromosomes. However, creating a gal1 object from merged BAM files takes several days.
Thank you again for your time.