flanglet / kanzi

Fast lossless data compression in Java

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"index out of range" in BWT transform

pzo opened this issue · comments

commented

Running kanzi on a truncated "bib" file from from calgary test suite generates an error message.

./Kanzi -compress -input=bib-truncated -output=bib.kanzi -transform=bwt -entropy=none -overwrite

_Kanzi 1.0 (C) 2017, Frederic Langlet
Encoding ...
panic: runtime error: index out of range

goroutine 5 [running]:
kanzi/transform.(*DivSufSort).ssMultiKeyIntroSort(0xc420072980, 0xb273, 0xb37, 0xe14, 0x2)
/home/user/go/src/kanzi/transform/DivSufSort.go:1261 +0x530
kanzi/transform.(*DivSufSort).ssSort(0xc420072980, 0xb273, 0xb37, 0xe14, 0x522d, 0x6046, 0x2, 0x104a0, 0x0)
/home/user/go/src/kanzi/transform/DivSufSort.go:452 +0x2ac
kanzi/transform.(*DivSufSort).sortTypeBstar(0xc420072980, 0xc42009f800, 0x100, 0x100, 0xc42020a000, 0x10000, 0x10000, 0x104a0, 0xc420072980)
/home/user/go/src/kanzi/transform/DivSufSort.go:280 +0x8c8
kanzi/transform.(*DivSufSort).ComputeSuffixArray(0xc420072980, 0xc4201e6000, 0x104a0, 0x104a0, 0x104a1, 0x104a1, 0x1b600)
/home/user/go/src/kanzi/transform/DivSufSort.go:112 +0xdb
kanzi/transform.(*BWT).Forward(0xc42006a8a0, 0xc4201e6000, 0x104a0, 0x104a0, 0xc4201f8003, 0x104a1, 0x104a1, 0x0, 0xc420040c80, 0x40cd6d, ...)
/home/user/go/src/kanzi/transform/BWT.go:137 +0x12e
kanzi/function.(*BWTBlockCodec).Forward(0xc42000c0b0, 0xc4201e6000, 0x104a0, 0x104a0, 0xc4201f8000, 0x104a4, 0x104a4, 0x4396bb, 0x10, 0x10100000053d300, ...)
/home/user/go/src/kanzi/function/BWTBlockCodec.go:73 +0x110
kanzi/function.(*ByteTransformSequence).Forward(0xc42000ade0, 0xc4201e6000, 0x104a0, 0x104a0, 0xc4201f8000, 0x104a4, 0x104a4, 0x0, 0x0, 0x0, ...)
/home/user/go/src/kanzi/function/ByteTransformSequence.go:85 +0x1eb
kanzi/io.(*EncodingTask).encode(0xc4200cc380)
/home/user/go/src/kanzi/io/CompressedStream.go:466 +0xb97
created by kanzi/io.(*CompressedOutputStream).processBlock
/home/user/go/src/kanzi/io/CompressedStream.go:391 +0x39d_

bib.zip

Hmm, that can't be good.
Ok, the file works with the Java and C++ implementations, so I messed up something in the port to Go.
Thanks for the report. I will take a deeper look and fix it when I have some time.

Ok. Fixed. It was a silly bug with a simple fix.

./Kanzi -compress -input=/tmp/bib-truncated -output=none -entropy=none -transform=bwt

Kanzi 1.0 (C) 2017, Frederic Langlet
Encoding ...

Encoding: 4 ms
Input size: 66720
Output size: 66739
Ratio: 1.000285
Throughput (KB/s): 16289