ashtuchkin / iconv-lite

Convert character encodings in pure javascript.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Encode with Shift JIS but receive EUCJP

QuocNguyen799 opened this issue · comments

I want to encode this charater to Shift JIS: 髙
const encoded = iconv.encode('髙', "Shift_JIS")
But i receive EUCJP instead of SJIS when i detect "encoded" above
const detected = Encoding.detect(encoded);
And the "encoded" that i receive is : 8de8
But it should be: 3f
https://www.skandissystems.com/testCharset.pl
image

Thanks for your reply, but it's not just that the detection is not precise, the encoding is also incorrect.
This character 髙 should be '3f' when convert to shift_jis.

3f in Shift_JIS is just question mark "?". I assume it means that the script you're referring to doesn't know how to encode this character.

Also not sure where you're getting 8de8. On my machine I get bytes 0xFB 0xFC:

> iconv.encode('髙', "Shift_JIS")
<Buffer fb fc>

Checking in a recent browser that supports https://encoding.spec.whatwg.org/ (the main standard that iconv-list follows), I see that this is indeed a correct encoding:

let dec = new TextDecoder("Shift_JIS");
let buf = Uint8Array.from([0xfb, 0xfc]);
document.body.innerText = dec.decode(buf);  // shows "髙"

The question mark "?" or 3f is exactly what I need, because character belong to EUC_JP , not Shift_JIS.
When i try it with php, it works
$str = mb_convert_encoding('髙', "SJIS"); $str = mb_convert_encoding($str, "UTF-8", "SJIS"); var_dump($str);
I don't know much about encoding standards. Maybe there is a difference in iconv-lite and php's encoding standards.
Do you have any suggestions for this?
If not, I will close this issue.
And thank you for your time.

As far as I know, recent versions of Shift_JIS such as Shift_JIS-2004 can encode the characters that were previously only encodable with EUC_JP (see https://en.wikipedia.org/wiki/Shift_JIS#Shift_JISx0213_and_Shift_JIS-2004). I assume PHP does not support it, or is somehow more strict about using the older version of Shift_JIS?

Iconv-lite only supports the extended version of Shift_JIS. I don't think there's an easy way to restrict encoding to a strict Shift_JIS. One hack I can think of could be to replace all "unsupported" characters before encoding with an explicit "?", but that requires knowledge of all these unsupported chars.