Odd results from stat command
arq5x opened this issue · comments
Aaron Quinlan commented
Consider the following sparse D4 file created from an ENCODE bigwig file:
wget https://www.encodeproject.org/files/ENCFF405ZDL/@@download/ENCFF405ZDL.bigWig
time d4utils create -S ENCFF405ZDL.bigwig ENCFF405ZDL.bigWig.d4
Now, create 10 random intervals in BED format to compute stats upon:
bedtools random -n 10 -l 100 -g human.hg38.genome | sort -k1,1 -k2,2n | cut -f 1-3 > test.bed
cat test.bed
chr1 31636177 31636277
chr13 41648360 41648460
chr13 90235345 90235445
chr14 106470082 106470182
chr15 54171880 54171980
chr16 1337049 1337149
chr20 60706629 60706729
chr4 19356795 19356895
chr6_GL000252v2_alt 3355374 3355474
chrX 151995155 151995255
Now, run stat on those regions:
d4tools stat ENCFF405ZDL.bigwig.d4 --stat mean --region test.bed
chr1 31636177 31636277 42949651.4
chr13 41648360 41648460 42949660.07
chr13 90235345 90235445 42949672.68
chr14 106470082 106470182 0
chr15 54171880 54171980 42949631.28
chr16 1337049 1337149 42949658.92
chr20 60706629 60706729 42949630.46
chr4 19356795 19356895 42949550.23
chr6_GL000252v2_alt 3355374 3355474 0
chrX 151995155 151995255 42949671.58
It looks like there is some sort of over/under flow issue with several of the mean values reported. For example, let's look at the exact depths for one of those 100bp regions using view:
d4tools view ENCFF405ZDL.bigwig.d4 chr1:31636177-31636277
chr1 31636176 31636276 0
This problem disappears when using dense file:
d4utils create ENCFF405ZDL.bigwig ENCFF405ZDL.bigWig.d4
d4tools stat ENCFF405ZDL.bigwig.d4 --stat mean --region test.bed
chr1 31636177 31636277 0
chr13 41648360 41648460 0.17
chr13 90235345 90235445 0
chr14 106470082 106470182 0
chr15 54171880 54171980 0
chr16 1337049 1337149 0
chr20 60706629 60706729 0
chr4 19356795 19356895 0
chr6_GL000252v2_alt 3355374 3355474 0
chrX 151995155 151995255 0