very high panel sequencing depth issue - difference depth output in mosdepth depth, bedtools coverage, and sambamba output

Question

very high panel sequencing depth issue - difference depth output in mosdepth depth, bedtools coverage, and sambamba output

ipstone opened this issue 8 months ago · comments

Hey Brent and everyone, thank you for all the awesome tools.

We have some high coverage panel sequencing data, but checking the depth of the regions using mosdepth, bedtools and sambamba, give quite a range of results (results obtained running these commands through snakemake file).

These tools are run with the default setting, what might cause such a huge difference in depth calculations?
Thanks in advance!

sambamba: 
"sambamba depth region -L bed/study_genes.bed {input} > coverage/study_sambamba/interval_coverage/{wildcards.sample}_interval_coverage.txt"

# chrom chromStart  chromEnd    F3  readCount   meanCoverage    sampleName
1   36349022    36349047    NM_001317122_cds_0_0_chr1_36349023_f    502 400 Sample-10
1   36354027    36354211    NM_001317122_cds_1_0_chr1_36354028_f    958 372.63  Sample-10
1   36358157    36358278    NM_001317122_cds_2_0_chr1_36358158_f    593 309.859 Sample-10
1   36358697    36358879    NM_001317122_cds_3_0_chr1_36358698_f    777 323.709 Sample-10
1   36359274    36359411    NM_001317122_cds_4_0_chr1_36359275_f    834 374.27  Sample-10
1   36359637    36359772    NM_001317122_cds_5_0_chr1_36359638_f    686 315.548 Sample-10
1   36359915    36360003    NM_001317122_cds_6_0_chr1_36359916_f    669 357.909 
...


bedtools:
"bedtools coverage -mean -a bed/study_genes.bed -b {input} > coverage/study/interval_coverage/{wildcards.sample}_interval_coverage.txt"

chr start   end gene    coverage    sample
1   36349022    36349047    NM_001317122_cds_0_0_chr1_36349023_f    43840.9609375   Sample-10
1   36354027    36354211    NM_001317122_cds_1_0_chr1_36354028_f    43905.3320312   Sample-10
1   36358157    36358278    NM_001317122_cds_2_0_chr1_36358158_f    26675.0253906   Sample-10
1   36358697    36358879    NM_001317122_cds_3_0_chr1_36358698_f    32416.6210938   Sample-10
1   36359274    36359411    NM_001317122_cds_4_0_chr1_36359275_f    54923.9648438   Sample-10
1   36359637    36359772    NM_001317122_cds_5_0_chr1_36359638_f    35807.59375 Sample-10
1   36359915    36360003    NM_001317122_cds_6_0_chr1_36359916_f    29420.7265625   Sample-10

mosdepth:
mosdepth -n --by bed/study_genes.bed coverage/study_mosdepth/interval_coverage/{wildcards.sample}-interval {input}
        gzip -dc coverage/study_mosdepth/interval_coverage/{wildcards.sample}-interval.regions.bed.gz > {output.interval_coverage}

chr start   end gene    coverage    sample
1   36349022    36349047    NM_001317122_cds_0_0_chr1_36349023_f    289.08  Sample-10
1   36349022    36349047    NM_012199_cds_0_0_chr1_36349023_f   289.08  Sample-10
1   36354027    36354211    NM_001317122_cds_1_0_chr1_36354028_f    287.42  Sample-10
1   36354027    36354211    NM_012199_cds_1_0_chr1_36354028_f   287.42  Sample-10
1   36358157    36358278    NM_001317122_cds_2_0_chr1_36358158_f    277.69  Sample-10
1   36358157    36358278    NM_012199_cds_2_0_chr1_36358158_f   277.69  Sample-10
1   36358173    36358278    NM_001317123_cds_2_0_chr1_36358174_f    278.95  Sample-10
1   36358697    36358879    NM_001317122_cds_3_0_chr1_36358698_f    283.55  Sample-10
...

Brent Pedersen · Answer 1 · Wed Sep 13 2023 21:41:04 GMT+0800 (China Standard Time)

Hi Isaac, if you look through the issues, there are a lot of questions like this. I think that mosdepth does a good job of giving a sane answer. Reasons why the tools can differ:

mosdepth does not look at base-quality so it will count all bases as covered even if they have very low quality
mosdepth has different defaults for mapping-quality--I think by default it includes all reads
mosdepth does not double-count overlapping pairs. So if r1 and r2 from a fragment overlap, it will only count the overlapped bases once, not twice. You can skip this by using --fast-mode.

I suggest to try mosdepth with different values for mapping-quality that make sense to you, and to try --fast-mode and see how much difference you see. I'm not sure how bedtools is getting so much higher coverage, but I suspect you'll get mosdpeth and sambamba to nearly agree with --fastmode