A python script built for generating flexer-compatible regular expressions to capture all the unicode code points in specific general categories.
This script was used in the development of the cooklang-c project.
- The script will automatically parse through every code point found in the all_unicode_chars.txt file and find each code points' general category.
- After some processing, the script will place the desired code points (based on the conditional statements in lines 50-56.
- Then, the script will generate a flexer-compatible regular expression, based on line length and delimination strings.
- Clone the repository.
- Cusomtize the conditional statements in lines 50-56 to include the general categories you want to generate the regular expressions you need.
- Edit the end of the script to print the expressions generated by using the
make_hex_string()
function. - Run the script with
python unicode.py
.