koka-lang / koka

Koka language compiler and interpreter

Home Page:http://koka-lang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

std/text/unicode width does not return column-width 2 for emojis

erf opened this issue · comments

I did expect the width function to return 2 for emojis when using the EastAsianWidth.txt file.

  println(width("👾"))

this returns 1

Is this method supposed to work similar to the display_width method of ziglyph or similar to this Python wcwidth spesification ? That is to give the rendered column width for modern terminal emulators using the latest Unicode standard?

This is what the standard library documentation says:

// Return the column-width of a unicode character.
// Equivalent to ``wcwidth``
pub fun char/width( c : char ) : int {
  if (zero-widths.force.contains(c.int)) then 0
  elif (asian-wide.force.contains(c.int)) then 2
  else 1
}

// Return the total column-width of a string.
pub fun string/width( s : string ) : int {
  var total := 0
  s.foreach( fn(c) {
    total := total + c.width
  })
  total
}

So yes, I believe the intent is for terminal emulators as in the python wcwidth spec, however I'm not certain if it is currently up to date (I'm not sure when Daan last updated the asian-wide list).

Also I would expect the following to print two utf16 characters, but it only does one utf32 character. I guess I'm less certain on the intended underlying representation for characters. I'll have to ask Daan.
"👾".slice.foreach(fn(c) c.println)

I'll just link this article here. It's a good read with some valuable links

https://mitchellh.com/writing/grapheme-clusters-in-terminals

Thanks for the link!

Great post. I'll have to look at the algorithm he references to improve Koka's clustering

Yeah i'm a beta tester on the Ghostty terminal (it's great!), and they have implemented Mode 2027 for proper Unicode handling