google / starlark-go

Starlark in Go: the Starlark configuration language, implemented in Go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bug: incorrect `len()` for UTF-8 Chars

Starshipping opened this issue · comments

Python:

Python 3.11.3 (main, May  3 2023, 23:19:07) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = "資源互換檔案格式"
>>> len(a)
8

Starlark Go:

Welcome to Starlark (go.starlark.net)
>>> a = "資源互換檔案格式"
>>> len(a)
24

Python3's strings are sequences of Unicode code points, of which "資源互換檔案格式" contains 8. But Starlark strings are sequences of UTF-k codes, where k=8 in the Go implementation and 16 in the Java implementation, of which that string contains 24, since each Hanzi has a 3-byte UTF-8 encoding. So this is working as intended.