brentp / mosdepth

fast BAM/CRAM depth calculation for WGS, exome, or targeted sequencing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Report all chromosomes/regions even though they have zero coverage

xsoleacha opened this issue · comments

Hi Brent,

I have a suggestion for mosdepth that would make its behavior more consistent regardless the input data. The issue is that when a whole chromosome (for whatever reason) has zero coverage, it is not reported in the summary file. Analogously, in the per-base file, chromosomes with zero coverage are not reported at all.

Since the per-base and the summary files are supposed to report coverage for the whole genome, my suggestion would be to add regions and chromosomes with zero coverage, making the output of mosdepth consistent regardless of the coverage of the input data.

Example of summary file that I get right now (chr8 & 9 are missing):
chr7 159138663 751 0.00 0 2
chr7_region 10152 231 0.02 0 1
chr10 135534747 594 0.00 0 2
chr10_region 3672 224 0.06 0 2

Suggested output:
chr7 159138663 751 0.00 0 2
chr7_region 10152 231 0.02 0 1
chr8 <length_of_chr8> 0 0.00 0 0
chr8_region <length_of_chr8_region> 0 0.00 0 0
chr9 <length_of_chr9> 0 0.00 0 0
chr9_region <length_of_chr9_region> 0 0.00 0 0
chr10 135534747 594 0.00 0 2
chr10_region 3672 224 0.06 0 2

Example of per-base file that I get right now:
chr7 0 27184209 0
chr7 27184209 27184271 1
chr7 27184271 55242388 0
chr7 55242388 55242389 1
chr7 55242389 55242412 2
chr7 55242412 56604537 0
chr7 56604537 56604678 1
chr7 56604678 110828782 0
chr7 110828782 110828949 1
chr7 110828949 140453080 0
chr7 140453080 140453247 1
chr7 140453247 140481371 0
chr7 140481371 140481538 1
chr7 140481538 159138663 0
chr10 0 89717480 0
chr10 89717480 89717616 1
chr10 89717616 89720664 0
chr10 89720664 89720764 1
chr10 89720764 89720773 2
chr10 89720773 89720872 1
chr10 89720872 107340261 0
chr10 107340261 107340313 1
chr10 107340313 118073651 0
chr10 118073651 118073840 1
chr10 118073840 135534747 0

Suggested output:
chr7 0 27184209 0
chr7 27184209 27184271 1
chr7 27184271 55242388 0
chr7 55242388 55242389 1
chr7 55242389 55242412 2
chr7 55242412 56604537 0
chr7 56604537 56604678 1
chr7 56604678 110828782 0
chr7 110828782 110828949 1
chr7 110828949 140453080 0
chr7 140453080 140453247 1
chr7 140453247 140481371 0
chr7 140481371 140481538 1
chr7 140481538 159138663 0
chr8 0 <length_chr8> 0
chr9 0 <length_chr9> 0
chr10 0 89717480 0
chr10 89717480 89717616 1
chr10 89717616 89720664 0
chr10 89720664 89720764 1
chr10 89720764 89720773 2
chr10 89720773 89720872 1
chr10 89720872 107340261 0
chr10 107340261 107340313 1
chr10 107340313 118073651 0
chr10 118073651 118073840 1
chr10 118073840 135534747 0

This modification would make it easier to parse the output of mosdepth when chromosomes are missing.

Thanks as always for all your effort!

XS.

Hi Xavier, I agree this would be useful and it is suggested in another open issue. I just haven't had the time to implement. I'll try to get to it soon.

Dear Brent,

thank you for your quick reply and for your efforts maintaining the tool. It is very useful for us!!

Best regards,

Xavi.