behdad / use-syllables

Use Universal Shaping Engine code from HarfBuzz to segment text into syllables

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

USE Syllables

This piece of code extracted from HarfBuzz, uses the Universal Shaping Engine implementation there to segment a list of Unicode codepoints into syllables.

Build

Build it by just running:

$ make

Run it like:

$ ./main 10a00 10a10 10a01 10a10 10a01 10a01
syllable 0..1 standard_cluster
syllable 1..3 standard_cluster
syllable 3..6 standard_cluster

Caveats

  • HarfBuzz's USE implementation might have more relaxed grammar than the spec.

  • HarfBuzz's USE implementation currently does not support main Indic scripts. While that's consistent with the spec, it might come as a surprise. Apple already supports Indic scripts in USE. We want to do as well. There's an issue in HarfBuzz github for that.

About

Use Universal Shaping Engine code from HarfBuzz to segment text into syllables


Languages

Language:C++ 71.1%Language:C 13.6%Language:Hack 11.4%Language:Ragel 3.8%Language:Makefile 0.1%