Inconsistent normalized values for some tags

Question

Inconsistent normalized values for some tags

vvanpo opened this issue 4 years ago · comments

Victor van Poppelen commented 4 years ago

Example with Taiwanese:

$ node -e "console.log(require('bcp-47-normalize')('zh-Hans-TW'))"
zh-TW
$ node -e "console.log(require('bcp-47-normalize')('zh-TW'))"
zh-Hant

So if I'm understanding correctly what this program is supposed to do, it's telling me that zh-TW is both the normal form of the tag that includes the 'Hans' script, and is "further normalized" down to the 'Hant' script?

Titus · Answer 1 · Fri Jul 24 2020 18:34:52 GMT+0800 (China Standard Time)

Seems like a bug. See here and here for the data. The last link includes zh_Hans, which seems why zh-Hans-TW incorrectly goes to zh-TW.

I wonder if zh-Hans-TW should go to zh-Hans, there is no data to suggest that I can quickly see.

Titus · Answer 2 · Fri Jul 24 2020 21:10:23 GMT+0800 (China Standard Time)

Thanks for reporting, @vvanpo, released in 1.1.0!

Hieu Do · Answer 3 · Fri Aug 07 2020 12:53:09 GMT+0800 (China Standard Time)

@wooorm This fix will result in zh-CN becoming zh, and lots of other normalization change. Is there a reason for this? Should it be marked as a BREAKING CHANGE instead?

https://npm.runkit.com/bcp-47-normalize

var bcp47Normalize = require("bcp-47-normalize")

console.log(bcp47Normalize('zh-CN'));
console.log(bcp47Normalize('zh-TW'));
console.log(bcp47Normalize('zh-MO'));
console.log(bcp47Normalize('zh-HK'));

"zh"
"zh-Hant"
"zh-Hant-MO"
"zh-Hant-HK"

Titus · Answer 4 · Fri Aug 07 2020 14:48:36 GMT+0800 (China Standard Time)

Yup, that’s the goal of normalizing. Chinese as spoken in China, well, the as spoken in China part is implied.

These four all go through here: https://github.com/unicode-org/cldr/blob/4b1225ead2ca9bc7a969a271b9931f137040d2bf/common/supplemental/supplementalMetadata.xml#L177

And then a couple of them are defaults: https://github.com/unicode-org/cldr/blob/4b1225ead2ca9bc7a969a271b9931f137040d2bf/common/supplemental/supplementalMetadata.xml#L1539

I’d normally consider it breaking, but the previous behavior was broken.