zeeev / wham

Structural variant detection and association testing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FATAL: was not able to gather stats on bamfile

gabyrech opened this issue · comments

Dear WHAM´s team:
I am trying to run WHAM for 7 bam files, each one corresponds to one sample. I would like to do an association test between one of the samples and the other six. Then, I am running:

$ WHAM-BAM -f ref.fa -m 2 -q 15 -p 10 -x 10 -t 1.sort.rmdup.bam -b 2.sort.rmdup.bam,3.sort.rmdup.bam,4.sort.rmdup.bam,5.sort.rmdup.bam,6.sort.rmdup.bam,7.sort.rmdup.bam > 1_vs_all.vcf

But I am getting this error:

INFO: WHAM-BAM will using the following fasta: ref.fa
INFO: WHAM-BAM requires : 2 soft-clips to score breakpoint
INFO: WHAM-BAM skip soft-clips with average base quality below: 15
INFO: WHAM-BAM skip soft-clips with mapping quaity below: 10
INFO: OpenMP will roughly use 10 threads
INFO: target bams:
1.sort.rmdup.bam
INFO: background bams:
2.sort.rmdup.bam
3.sort.rmdup.bam
4.sort.rmdup.bam
5.sort.rmdup.bam
6.sort.rmdup.bam
7.sort.rmdup.bam
INFO: gathering stats for each bam file.
FATAL: was not able to gather stats on bamfile: 1.sort.rmdup.bam

All the bam files were generated using BWA-MEM, and processed with samtools Version: 0.1.19. Here the parameters:
$ bwa mem -R '@rg\tID:1\tSM:1' -M -t 18 ref.fa 1_R1.PF.fastq.gz 1_R2.PF.fastq.gz | samtools view -S -b - > 1.bam
$ samtools sort 1.bam 1.sort.bam
$ samtools rmdup 1.sort.bam.bam 1.sort.rmdup.bam

Any idea what is going on? Or at least something different to try?
Thanks in advance,
Gabriel

Gabriel,

Thank you for reporting this error. All of your commands look correct. The subroutine that is throwing the error tries to sample reads from the first 20 sequences in the header. How many @sq tags are in your headers? Are there reads for all @sqs in the header?

I can resolve this bug today.

Hello Zev,
Thanks for you reply. Actually I have 654 @sq tags in the headers. Since the reference genome is an "uncompleted" assembly (654 contigs) and some contigs are very small (2000bp) and full of repetitive sequences, It is possible that some of them do not contain any mapped reads.
Thanks for trying to fix it!
Gabriel

option 1 - the fastest

change the following :
unsigned int max = 20; to unsigned int max = 500;

https://github.com/jewmanchue/wham/blob/master/src/bin/multi-wham-testing.cpp#L627

and then run make.

option 2 - hopefully end of day

I'm going to add a command line option to change the number of seqids sampled.

Thanks again.
--Zev

Great! I tried with "the fastest" and WHAM is running now!
Thank you very much for such a speedy solution!
Gabriel

Great. If you're wondering the number was set to 20 for the human folks.... to avoid sampling bait sequences.

I'm leaving this open until I provide a command line option.

--Zev

Hi Zev,

I am getting the same error with one of my bam files.

$ ~/bin/WHAM-BAM -f ../human_g1k_v37.fasta -p 15 -q 30 -m 5 -x 24 -t 150924_13-11719_Merged.bam > 150924_13-11719
INFO: WHAM-BAM will using the following fasta: ../human_g1k_v37.fasta
INFO: WHAM-BAM skip soft-clips with mapping quaity below: 15
INFO: WHAM-BAM skip soft-clips with average base quality below: 30
INFO: WHAM-BAM requires : 5 soft-clips to score breakpoint
INFO: OpenMP will roughly use 24 threads
INFO: target bams:
150924_13-11719_Merged.bam
INFO: gathering stats for each bam file.
FATAL: was not able to gather stats on bamfile: 150924_13-11719_Merged.bam

I tried changing the unsigned int max = 20; to unsigned int max = 500; and then run make and I get this error:
src/bin/mergeIndv.cpp:350: warning: comparison between signed and unsigned integer expressions

and it exits out and when I try running it again, its till throws me the same error.

Do you know what's going on here?

Thanks for the help.
Ashini.

Wham is trying to randomly samples windows. It is assuming that each SQ in the header is fully represented in the bam file. The line of code that is causing you trouble is:

https://github.com/zeeev/wham/blob/master/src/bin/multi-wham-testing.cpp#L622

If you get rid of that block it sample until it gets enough reads.

I'm leaving this open as a bug.

--Zev

Thanks Zev. thanks a ton. This fixed it.

Thanks,
Ashini

Caution for exome data:

If the breakpoint isn't in the capture region, wham will not call it.

Sure. I got it.

Thanks.

I guess here might be a proper thread that I bring my question. I am trying to use WHAM for my WGS data (~30X) and wondering the setting for bwa mem. I have used the bwa command:
'bwa mem fastq.R1 fastqR2'

but seems it does not give me reads with XA tags, when I checked them using samtools view.

Considering commands above, should I use '-M' option for mark shorter split hits as secondary? Is this for XA and/or SA tags? Please correct me if I am wrong or there's recommended command line for bwa mem

Cheers,
Joon

@sehrrot What version of bwa-mem are you using?

Setting the -M flag is fine, not setting -M is fine too. I think most other tools want -M.

Hi Zev,

I am trying to run WHAM for some single end data. But its been more than 10 hours and it still says "INFO: gathering stats for each bam file."
I have already removed this https://github.com/zeeev/wham/blob/master/src/bin/multi-wham-testing.cpp#L622
so it does not throw out error and is still running. Moreover, there are only 84 @sq tags in the header file and its a pretty small BAM file with about ~ 71000 reads.

Do you think it might be coz its single end data. Please let me know.

Thanks,
Ashini