dscorbett / rust-uax29

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

  • David Corbett <corbett.dav@husky.neu.edu>
  • Mimi Lin <hloople@gmail.com>

This is an implementation of a word-breaking algorithm in Rust following Unicode Standard Annex #29 as an extension of the C++ library boyers/unicode. It also includes a barebones C++ version of fmt.

README.md is this file.

fmt is the directory for the fmt program. In it are fmt.cpp (the source code) and Makefile. Run make to make it and ./fmt filename max_width to use it.

unicode is the modified copy of boyers/unicode. The modification is the addition of the two files word_break.cpp and word_break.h. These are used by fmt.

uax29 is the Rust project directory. uax29/Cargo.toml and uax29/Cargo.lock are Rust project metadata files.

In uax29/src, main.rs can be ignored. This is supposed to be a library, so it should not have a main file. lib.rs is the library file. tests.rs is a file of tests. defaults.rs is the primary file of the library. breaks.rs contains information about word sentence breaks. c_defaults.rs contains functions for using this library from C++.

uax29/scripts contains scripts. unicode.py creates breaks.rs. generate_tests.py creates tests.rs.

About


Languages

Language:C 59.9%Language:Rust 37.2%Language:C++ 1.7%Language:Python 1.1%Language:Shell 0.0%