Bug in view? different values when including the first position in a chromosome
mrvollger opened this issue · comments
Hello,
I am running into a weird issue where if I include the "0" base of a chromosome in my view command I get a different result than if I start with the 1 position. E.g.
$ d4tools view fdr.peaks.d4 chr1:0-100000
chr1 0 100000 0 0 0 0
$ d4tools view fdr.peaks.d4 chr1:1-100000 | head
chr1 1 60258 0 0 0 0
chr1 60258 60310 0 0 1 0
chr1 60310 60427 0 0 0 1
chr1 60427 60481 0 0 1 0
chr1 60481 60637 0 0 0 1
chr1 60637 60657 0 0 1 0
chr1 60657 60810 0 0 0 1
chr1 60810 60869 0 0 1 0
chr1 60869 61019 0 0 0 1
chr1 61019 61025 0 0 1 0
I thought this might be because 0 is invalid with the region format, but I also get zeros when I don't select any region at all:
$ d4tools view fdr.peaks.d4 | head -n 2
chr1 0 248956422 0 0 0 0
chr2 0 242193529 0 0 0 0
I have my file uploaded here in case that is helpful:
https://eichlerlab.gs.washington.edu/help/mvollger/tracks/fiberseq/trackHub_GM12878_2022-08-18/fdr.peaks.d4
And this is my version of d4tools
d4tools --version
D4 Utilities Program 0.3.8(D4 library version: 0.3.7)
Usage: d4tools <subcommand> <args>
Possible subcommands are:
create Create a new D4 depth profile
framedump Dump The container data
index Index related operations
ls-track List all available tracks in the D4 file
merge Merge existing D4 file as a multi-track D4 file
plot Plot the specified region
show Print the underlying depth profile
stat Run statistics on the given file
view Same as show
Type 'd4tools <subcommand> --help' to learn more about each subcommands.
Thanks in advance,
Mitchell
Thanks Mitchell, we are looking into this.
Hi @mrvollger, thanks for reporting this bug. Please don't mind my delayed reply for some personal reasons.
I confirmed this is a d4 bug and should be fixed. After investigating the bug, I realized that the output
chr1 0 100000 0 0 0 0
is expected, while
chr1 1 60258 0 0 0 0
chr1 60258 60310 0 0 1 0
chr1 60310 60427 0 0 0 1
chr1 60427 60481 0 0 1 0
chr1 60481 60637 0 0 0 1
chr1 60637 60657 0 0 1 0
chr1 60657 60810 0 0 0 1
chr1 60810 60869 0 0 1 0
chr1 60869 61019 0 0 0 1
chr1 61019 61025 0 0 1 0
is wrong.
The d4 file you provided actually have all none-zero value in chr11.
This bug actually caused by how d4 uses the frame index - the binary search inside the index doesn't actually check if the item is on the same chromosome and this causes values defined for chr11 mistakenly returned for the query of chr1.
And I have pushed the fix
$ target/debug/d4tools view /tmp/repo.d4 chr1:0-100000
chr1 0 100000 0
$ target/debug/d4tools view /tmp/repo.d4 chr1:1-100000
chr1 1 100000 0
Please let me know if you have any further question.
Thanks!
Hao
@38 thanks so much! I appreciate the fix :)
Excellent, thanks @38. Might be time for a new formal release!