38 / d4-format

The D4 Quantitative Data Format

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wrong mean coverage values from indexed d4 file

Jakob37 opened this issue · comments

Leaving this for reference. Working with #78, I also tried running indexed d4 files. Here, I also got strange results.

I traced it in the code, and did not seem to end up in the same location as the one for the #78, so this seems like it could be a separate issue.

I get consistently very high coverage results for the indexed d4 files. The numbers differs from the unindexed. Occasionally the coverage matches what is found when manually summing the output from d4tools show, but generally it isn't.

$ d4tools stat --region data/intervals_19_only.bed ~/data/hg002_coverage.indexed.d4 
19      45769709        45782552        279.89371642139685
19      4090321 4124122 137.56891807934676
19      49635292        49640143        660.9381570810142
19      55151767        55157773        505.96353646353646

Expected results are:

19      4090321 4124122 47.764267329368955
19      45769709        45782552        45.254146227516934
19      49635292        49640143        46.97567511853226
19      55151767        55157773        37.25041625041625

The bed file:

19      45769709        45782552
19      4090321 4124122
19      49635292        49640143
19      55151767        55157773