brentp / mosdepth

fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

misleading output in `summary.txt` file

nservant opened this issue · comments

Hi,

We had an issue which might be interesting for you (and other users).
Whe have a WES sample, and after a bug on a storage system, we only get reads on the chr1 after doing an intersectBed of the bam with the bed target file. Nothing related to mostdepth so far.

However, when we compute the coverage of this bam file with ;

mosdepth -t 4 -n --quantize 0:1:10:50:100: --by Twist_Exome_Core_Covered_Targets_hg38.bed PREFIX onTarget.bam

It reports a very high total mean coverage (181.34X) in the summary file (grep total_region)

chrom	length	bases	mean	min	max
chr1	248956422	937505185	3.77	0	2887
chr1_region	3358574	609028999	181.34	0	2887
total	248956422	937505185	3.77	0	2887
total_region	3358574	609028999	181.34	0	2887

So to me this is misleading, as the total_region does not seem to take into account the full bed, but only the non-zero chromosomes.

Of note, this is confirmed using the regions.bed.gz file but the 'real' mean coverage should be 18.36X.

>>zcat regions.bed.gz | awk '$1=="chr1"{ z=($3-$2); s=s+z*$4; c=c+z}END{print s/c}'
181.336
>>zcat test.regions.bed.gz | awk '{ z=($3-$2); s=s+z*$4; c=c+z}END{print s/c}'
18.3644

the issue is also true for the file mosdepth.region.dist.txt

Thanks for the clear report. I see what you mean. Due to the way I wrote that part, it's not an easy fix. I'll have a closer look next week but it might be that this will remain a (now) known issue.

Thank you @nservant for documenting this issue, I've run into the same unfortunately.

Any progress on a fix? If not, can you note this behavior more prominentely in the README as a known issue? It's not a "known" issue otherwise.

I'm using mosdepth to evaluate regional enrichment from Oxford Nanopore adaptive sampling. The difference between total_region and manual calculation from region.bed.gz changed our wet bench plans for a re-run of valuable samples. I sent this issue to colleagues who have been relying on tailing summary.txt as well.

We're running version 0.3.6 from bioconda. Thank you, really appreciate the speed and ease of use of the package for this application.