std/text/unicode width does not return column-width 2 for emojis
erf opened this issue · comments
I did expect the width
function to return 2 for emojis when using the EastAsianWidth.txt file.
println(width("👾"))
this returns 1
Is this method supposed to work similar to the display_width method of ziglyph
or similar to this Python wcwidth spesification ? That is to give the rendered column width for modern terminal emulators using the latest Unicode standard?
This is what the standard library documentation says:
// Return the column-width of a unicode character.
// Equivalent to ``wcwidth``
pub fun char/width( c : char ) : int {
if (zero-widths.force.contains(c.int)) then 0
elif (asian-wide.force.contains(c.int)) then 2
else 1
}
// Return the total column-width of a string.
pub fun string/width( s : string ) : int {
var total := 0
s.foreach( fn(c) {
total := total + c.width
})
total
}
So yes, I believe the intent is for terminal emulators as in the python wcwidth spec, however I'm not certain if it is currently up to date (I'm not sure when Daan last updated the asian-wide list).
Also I would expect the following to print two utf16 characters, but it only does one utf32 character. I guess I'm less certain on the intended underlying representation for characters. I'll have to ask Daan.
"👾".slice.foreach(fn(c) c.println)
I'll just link this article here. It's a good read with some valuable links
https://mitchellh.com/writing/grapheme-clusters-in-terminals
Thanks for the link!
Great post. I'll have to look at the algorithm he references to improve Koka's clustering