38 / d4-format

The D4 Quantitative Data Format

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bug: off-by-one bug on bam reads

andypexai opened this issue · comments

I'll make a genome of 30 bases, and my BAM has one read, 21 bases in length, that spans from bases 5-25 (1-based start) or 4-25 (0-based start):

    ██████FAKE_READ██████
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
         1         2         3
123456789012345678901234567890

It has this as a BAM:

$ samtools view -h test.bam
@HD     VN:1.5  SO:coordinate
@SQ     SN:chr1 LN:30
@PG     ID:samtools     PN:samtools     VN:1.19.2       CL:samtools view -h test.bam
FAKE_READ       0       chr1    5       1       21M     *       0       0       *       *       AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:21 YT:Z:UU

It should be 4 bases depth 0, followed by 21 bases depth 1, followed by 5 bases at depth 0. See the samtools output below:

$ samtools depth -a test.bam
chr1    1       0
chr1    2       0
chr1    3       0
chr1    4       0
chr1    5       1
chr1    6       1
chr1    7       1
chr1    8       1
chr1    9       1
chr1    10      1
chr1    11      1
chr1    12      1
chr1    13      1
chr1    14      1
chr1    15      1
chr1    16      1
chr1    17      1
chr1    18      1
chr1    19      1
chr1    20      1
chr1    21      1
chr1    22      1
chr1    23      1
chr1    24      1
chr1    25      1
chr1    26      0
chr1    27      0
chr1    28      0
chr1    29      0
chr1    30      0

And just to provide a little more info, here are a couple bedtools operations:

$ bedtools bamtobed -i test.bam
chr1    4       25      FAKE_READ       1       +
$ printf "chr1\t0\t30\n" > chr1.bed
$ bedtools coverage -a test.bam -b chr1.bed    
chr1    4       25      FAKE_READ       1       +       4       25      0,0,0   1       21,     0,      1       21      21      1.0000000

Everything looks as expected to me. Now I'll make the d4 file:

$ d4tools create -q=1 test.bam test.d4
$ d4tools view test.d4
chr1    0       4       0
chr1    4       26      1
chr1    26      30      0

Here it appears that the d4tools believes my read is 22 bases long, so the end coordinate is off by one, more specifically it's +1 more than it should be.

@38 this does look like a bug. Do you possibly have time to have a look?