jquast / blessed

Blessed is an easy, practical library for making python terminal apps

Home Page:http://pypi.python.org/pypi/blessed

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sequence.length() not always accurate (e.g. Left-to-right Isolate/

tiptenbrink opened this issue · comments

Thanks for the great library! It's a lot more robust and easier to work with than some alternatives.

Sequence.length() uses jquast/wcwidth internally. Unfortunately, it is not accurate for all Unicode characters. These include LRI (U+2066) and PDI (U+2069). For both, wcwidth returns 1 when these characters have length zero as they are not printed in the terminal (I'm using GNOME terminal). This corresponds to jquast/wcwidth#26. A possibility to fix this would be to replace wcwidth with cwcwidth, which is used by curtsies (and bpython) and as a bonus has a much faster implementation.

Context

It's possible some terminals show these as 1 width but that would be incorrect behavior, as LRI and PDI are supposed to simply affect directionality (for LTR, RTL scripts) and not be displayed. For example if you want to display individual Hebrew characters not with actual meaning, but as a binary decoding (which is my strange use case), you want to print each character as '⁦א⁩' (there's a LDI and PDI on the left and right side of the character, respectively), so if you combine multiple your string will be displayed in memory order, e.g. '⁦א⁩⁦ל⁩'. If you would print it normally, you'd get 'אל'. As you can see also in this text, they are invisible.

Of course, some editors do display these characters (e.g. IntelliJ) as they can be sneaked in to alter source code (see for example the security issue that prompted the recent 1.56.1 Rust release) and people who view code in the terminal might want some special characters to reveal the presence of those characters as well. But that should not be the default, as in actual display strings the characters should be invisible.

@jquast will need to take a look at this when he gets the time. Just wanted to comment so you know it's not being ignored.

I see cwcwidth uses category 'Cf' in the zero width table and wcwidth does not, that is the problem, I will try to address it in the coming weeks in wcwidth thanks

Thanks for looking into it!

I know its been a long time, but this is resolved in today's release of wcwidth by jquast/wcwidth#91

best wishes