pieroxy / lz-string

LZ-based compression algorithm for JavaScript

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how to setup a compressUTF8() variant

nielsnl68 opened this issue · comments

Hi, could you give me some hints to change the compressToUTF16() to compressToUTF8()?
I use an other lib to encrypt the string but it cant handle utf16 atm.

Can you tell us why you need a utf8 implementation? It makes little to no sense at all.

You talk about encryption but utf (8 or 16) is about encoding. Can you tell us more about what you’re trying to do?

Hello @pieroxy ,

Thanks for your reply, in my case i use our compression module to talk between the server and the clients over ajax call's and compressed url's to activate mailbox connections.
At the moment all communications between each other is based on the utf8 encoding. I tried using your compressUTF16 solution but i found that the data was not send correctly or better the receiving part saw some end of file bytes and diched the remainder of the data,

At the moment i use de base64 solution but now the packages are 2x as big.

So i was hoping that we could convert the encrypted data into a utf8 string like you do with utf16.

Just to note - Javascript does not support UTF8 - it converts any text strings etc into UTF16, which means there is potentially data change when converting (one UTF8 character might be >1 UTF16 character etc).

Saying that, if you need to send as UTF8 then something like https://gist.github.com/joni/3760795/8f0c1a608b7f0c8b3978db68105c5b1d741d0446 might be a good starting point for how to convert - you'll still need to send it as raw binary data from the array. Decoding is another matter as it'll need converting to binary without allowing JS itself to convert to UTF16 (which will break it entirely).

Are you sure it won't be easier to put a UTF16 support library on the backend instead?

I did not know that javascript did not support utf8, this is the first time i read about it.

That gist example does look perfect.

I did not know that javascript did not support utf8, this is the first time i read about it.

Pretty good writeup including references - https://flaviocopes.com/javascript-unicode/

okey, then i makes me wonder why ajax calls are breaking.
I will investigate that soon.

It does make sense now to change

Binary is always a safer transport mechanism than some encoding ;-)

okay, i created my own fromUTF8() and toUTF8() functions so they a readable string from a byteArry,

I understand how your _compress() function works. like: _compress(uncompressed, 8,function (a) { return f(a); });
But i have no clue as how i should set the _decompress() function's second parameter.

could you help with that?

const UTF8convert = [
    0x01, 0x06, 0x3D, 0x3E, 0x5a,  // 
    0x60, 0x62, 0x64, 0x68, 0x69, 0x6B, 0x70, 0x73, 0x93
];

function toUTF8(data) { // array of bytes
    const char = (value) => {
        let q = UTF8convert.indexOf(value);
        if (q >= 0) {
            return String.fromCharCode(q + 0xE0);
        } else if (value < 0x5d) {
            return String.fromCharCode(value + 0x21);
        } else {
            return String.fromCharCode(value + 0x44);
        }
    }

    var str = '',
        shift = 0,
        i;

    for (i = 0; i < data.length; i++) {
        var value = (data[i] << (i % 7)) + shift;
        shift = value >> 7;
        value = value & 0x7f;
        str += char(value);
        if (i % 7 === 6) {
            str += char(shift);
            shift = 0;
        }
    }
    str += char(shift);
    return str;
}

function fromUTF8(str) {
    var utf8 = [], shift = 0, x = 0;
    for (var i = 0; i < str.length; i++) {
        var charcode = str.charCodeAt(i);
        if (charcode >= 0xe0) {
            charcode = UTF8convert.at(charcode - 0xe0);
        } else if (charcode >= 0xA1) {
            charcode = charcode - (0x44);
        } else {
            charcode = charcode - (0x21);
        }
        if ((i % 8) > 0) {
            shift = (charcode << (8 - (i % 8))) & 0xff;
            utf8[utf8.length - 1] += shift;
        }
        if ((i == 0 || ((i % 8) != 7)) && (i < str.length - 1)) {
            charcode = charcode >> (x % 7);
            x++;
            utf8.push(charcode);
        }
    }
    return utf8;
}

Due to the listed situation with Javascript using UTF16 and not UTF8, I think the best solution if UTF8 is truly desired would be compressToUint8Array() and encoding the resulting output with some implementation-specific function.

This can probably be closed as a "won't implement"

Not sure - NodeJS is supposed to support it better, but we need compatibility between front-end and back-end - there is TextEncoder and TextDecoder to play with once everything else is updated properly - holding off closing until we've had a chance :-P