Compression/decompression of multibyte characters fails
iconara opened this issue · comments
Using lz4-ruby 0.3.1:
input = 'Ķ' * 100
output = LZ4.uncompress(LZ4.compress(input))
output.force_encoding(Encoding::UTF_8)
input.should == output # false
The exact characters don't matter, it just seems to matter that they are multibyte. Is there a String#size
call somewhere that should have been a String#bytesize
?
So I see that a few of the last commits actually change calls from #size
to #bytesize
, so is this fixed? Will there be a release soon?
So I see that a few of the last commits actually change calls from #size to #bytesize, so is this fixed? Will there be a release soon?
Yes, I will soon release a version that fixes this problem.
lz4-ruby 0.3.2 is released.
iconara, the issue that I was having, I realized, is actually this same multi-byte problem you were having. I mischaracterized it with the bug that I had originally filed. But, I now see that the issue has to deal with multi-byte strings because after I force_encode UTF8 on my string the resulting uncompressed size is different than the original input string.
Just tested the 0.3.2 fix out and it's working for me. Thanks, komiya-atsushi!