Outputs a regex to match UTF-8 byte sequences for all codepoints matching an ICU unicode regex.
# all Chinese characters
./charclass '\p{Han}'
# horizontal whitespace
./charclass '\h'
The \p
option is especially powerful because it can match unicode
properties.
To use the regexes, give them aliases in your Flex file:
/* from charcode '\h' */
whitespace \x09|\x20|\xc2\xa0|\xe1\x9a\x80|\xe2\x80[\x80-\x8a]|\xe2\x80\xaf|\xe2\x81\x9f
%%
{whitespace} { /* ... */ }
Requires C99, ICU, and pkg-config.
./configure
make