ruby / error_highlight

The gem enhances Exception#message by adding a short explanation where the exception is raised

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unicode characters

mame opened this issue · comments

Currently, error_highlight does not handle Unicode characters well. There are two subissues.

  1. Ruby::AST::Node#first_column and #last_column seem to return the column in bytes, but String#match handles the index in characters. We need to convert the column indexes.
  2. Some Unicode characters are displayed as two (or more?) columns in a terminal with monospace font.

(1) is relatively simple, but (2) is a bit tough. It requires a table telling how many columns each character has. It is known that Reline has such a table. But because error_highlight is a built-in gem that is loaded at Ruby process invocation, it is not good for error_highlight to depend on Reline (unless we make Reline a special built-in gem). We need to discuss how we make the table available to error_highlight.

Hey @mame!

I hit this same thing with ripper when I was writing prettier. I ended up solving it by taking the source, splitting it up into multiples lines, and converting each into an object that responded to #[] so that I could get the right indices.

Here are some links to the source:

I hope it's helpful!

Thanks for the information. I think it is about the issue (1) that I said. Yeah, it is solvable by converting the indices.

The tougher issue is (2). Unfortunately, some Unicode characters (mainly Chinese, Japanese, and Korean characters) are rendered as if they have two columns.

image

is one Japanese letter that takes two columns in the terminal. To highlight the letter, we need to put two ^s under the line. To implement this, error_highlight needs a table to tell what character takes two (or more) columns.

Just FYI: To make matters worse, the column count may change depending on a font and a terminal. This issue is called East Asian Width:

Ambiguous width characters are all those characters that can occur as fullwidth characters in any of a number of East Asian legacy character encodings. They have a “resolved” width of either narrow or wide depending on the context of their use.

To be honest, I don't want to face this problem for now 😇

@mame I see, I think I understand the problem better now. In that case it would probably be nice to have Ruby::AST::Node have methods like {first,last}_character_column or something similar.