[EUC-JP] U+4FFF(俿) is encoded to IBM拡張文字(8FB1C8) instead of EUC-JP(F9BB)

Question

[EUC-JP] U+4FFF(俿) is encoded to IBM拡張文字(8FB1C8) instead of EUC-JP(F9BB)

mercury233 opened this issue 3 years ago · comments

var iconvLite = require("iconv-lite")
const theChar = String.fromCharCode(0x4FFF);
const theEncodeResult = iconvLite.encode(theChar, 'EUC-JP');
const theDecodeResult1 = iconvLite.decode(Buffer.from([0x8F, 0xB1, 0xC8]), 'EUC-JP');
const theDecodeResult2 = iconvLite.decode(Buffer.from([0xF9, 0xBB]), 'EUC-JP');

console.log(theChar);
console.log(theEncodeResult);
console.log(theDecodeResult1);
console.log('------');
console.log(theDecodeResult2);
console.log(theDecodeResult1 === theDecodeResult2);

https://runkit.com/mercury233/6177adadef03d40008209995

As you can see, both 8FB1C8 and F9BB can be decoded, but it can't be encoded correctly.

Alexander Shtuchkin · Answer 1 · Tue Oct 26 2021 23:15:51 GMT+0800 (China Standard Time)

Thanks for the runkit link! I see "俿" is encoded as <8F, B1, C8> (theEncodeResult), what do you mean it can't be encoded correctly? Is this encoding incorrect?

Mercury233 · Answer 2 · Wed Oct 27 2021 08:34:07 GMT+0800 (China Standard Time)

I know very few about character encoding, and I found the EUC-JP code of "俿" may be F9BB, and iconv-lite do can decode it

Alexander Shtuchkin · Answer 3 · Sat Oct 30 2021 06:16:42 GMT+0800 (China Standard Time)

Well, honestly, I don't know much about EUC-JP either :) Current behavior seems reasonable, so I'm not sure what to do here. Let me know if you learn anything more specific (ideally with a link to some kind of standard), I can then reopen the issue. Thanks!