daac-tools

daac-tools

Geek Repo

Github PK Tool:Github PK Tool

daac-tools's repositories

vibrato

🎀 vibrato: Viterbi-based accelerated tokenizer

Language:RustLicense:Apache-2.0Stargazers:301Issues:7Issues:19

vaporetto

πŸ›₯ Vaporetto: Very accelerated pointwise prediction based tokenizer

Language:RustLicense:Apache-2.0Stargazers:217Issues:3Issues:5

daachorse

🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure in Rust.

Language:RustLicense:Apache-2.0Stargazers:189Issues:3Issues:5

find-simdoc

Finding all pairs of similar documents time- and memory-efficiently

Language:RustLicense:Apache-2.0Stargazers:56Issues:2Issues:1

python-vibrato

Viterbi-based accelerated tokenizer (Python wrapper)

Language:RustLicense:Apache-2.0Stargazers:34Issues:2Issues:1

trie-match

Fast match expression optimized for string comparison

Language:RustLicense:Apache-2.0Stargazers:31Issues:3Issues:1

crawdad

🦞 Rust library of natural language dictionaries using character-wise double-array tries.

Language:RustLicense:Apache-2.0Stargazers:27Issues:2Issues:2

python-vaporetto

πŸ›₯ Vaporetto is a fast and lightweight pointwise prediction based tokenizer. This is a Python wrapper for Vaporetto.

Language:RustLicense:Apache-2.0Stargazers:20Issues:2Issues:1

python-daachorse

🐎 A fast implementation of the Aho-Corasick algorithm using the compact double-array data structure. (Python wrapper for daachorse)

Language:RustLicense:Apache-2.0Stargazers:13Issues:1Issues:0

include-bytes-zstd

Includes a file with zstd compression in Rust

Language:RustLicense:Apache-2.0Stargazers:9Issues:2Issues:0

rucrf

Conditional Random Fields implemented in pure Rust

Language:RustLicense:Apache-2.0Stargazers:6Issues:2Issues:0

guidelines

Guidelines for daac-tools community

Stargazers:0Issues:2Issues:0

vaporetto-models

Tokenization models and training scripts for Vaporetto fast tokenizer

Language:RustLicense:Apache-2.0Stargazers:0Issues:2Issues:0