komiya-atsushi / lz4-ruby

Ruby bindings for LZ4 (Extremely Fast Compression algorithm)

Home Page:https://rubygems.org/gems/lz4-ruby

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Compression/decompression of multibyte characters fails

iconara opened this issue · comments

commented

Using lz4-ruby 0.3.1:

input = 'Ķ' * 100
output = LZ4.uncompress(LZ4.compress(input))
output.force_encoding(Encoding::UTF_8)
input.should == output # false

The exact characters don't matter, it just seems to matter that they are multibyte. Is there a String#size call somewhere that should have been a String#bytesize?

commented

So I see that a few of the last commits actually change calls from #size to #bytesize, so is this fixed? Will there be a release soon?

So I see that a few of the last commits actually change calls from #size to #bytesize, so is this fixed? Will there be a release soon?

Yes, I will soon release a version that fixes this problem.

lz4-ruby 0.3.2 is released.

iconara, the issue that I was having, I realized, is actually this same multi-byte problem you were having. I mischaracterized it with the bug that I had originally filed. But, I now see that the issue has to deal with multi-byte strings because after I force_encode UTF8 on my string the resulting uncompressed size is different than the original input string.

Just tested the 0.3.2 fix out and it's working for me. Thanks, komiya-atsushi!