Process character-width specifiers in text

Question

Process character-width specifiers in text

samliddicott opened this issue a year ago · comments

Control of character width may be specifiable with CSI escape sequences, and I note that these sequences are terminated by NL among others. See #511

This is mentioned in https://www.cl.cam.ac.uk/~mgk25/ucs/scw-proposal.html

Set Character Width proposal (version 3)
by Markus Kuhn

This proposal adds a new control sequence to those defined in the ISO 6429 (= ECMA-48) standard, to allow applications to specify exactly, which ISO 10646 character sequences shall be displayed as non-spacing, single-width or double-width characters or ligatures (as needed for ideographic languages).

Martin Geisler · Answer 1 · Sun Jun 18 2023 16:45:04 GMT+0800 (China Standard Time)

Control of character width may be specifiable with CSI escape sequences, and I note that these sequences are terminated by NL among others. See #511

This is mentioned in https://www.cl.cam.ac.uk/~mgk25/ucs/scw-proposal.html

Set Character Width proposal (version 3) by Markus Kuhn

This proposal adds a new control sequence to those defined in the ISO 6429 (= ECMA-48) standard, to allow applications to specify exactly, which ISO 10646 character sequences shall be displayed as non-spacing, single-width or double-width characters or ligatures (as needed for ideographic languages).

Thanks for linking to this proposal, which I had not heard about.

Right now, Textwrap simply ignores the CSI color sequences. More precisely, if it finds the two bytes in

/// The CSI or “Control Sequence Introducer” introduces an ANSI escape
/// sequence. This is typically used for colored text and will be
/// ignored when computing the text width.
const CSI: (char, char) = ('\x1b', '[');

then it will ignore all characters until it sees a character in this range:

/// The final bytes of an ANSI escape sequence must be in this range.
const ANSI_FINAL_BYTE: std::ops::RangeInclusive<char> = '\x40'..='\x7e';

I might have gotten those ranges from Wikipedia, I'm not sure any longer.

Processing the new proposed control sequences can be done today by providing your own custom implementation of the Fragment trait. This trait is used for all wrapping computations: it tells the library the size of a single unbreakable block of text. In particular, you can use the wrap_optimal_fit algorithm with your own custom fragment and get beautifully wrapped lines of text.

Does that help with your use-case?