Rairye's repositories
zh-sentence
Light-weight sentence tokenizer for Chinese languages.
ja-sentence
Light-weight sentence tokenizer for Japanese.
js-sentence-tokenizers
JavaScript sentence tokenizers for multiple natural languages.
kr-sentence
Light-weight sentence tokenizer for Korean. Supports full-width and half-width punctuation marks.
sentence-tokenizers
Sentence tokenizers for several languages
thelangbot
Twitter bot to help you learn foreign languages. Building a community through tweets. Retweets #100DaysOfLanguage and #langtwt.
back-cleaner
Server-side Python tool for escaping script tags and converting characters into HTML entities (no regex).
content_moderation_ideas
A collection of proof-of-concept approaches for using ideas from NLP/text processing to handle content moderation. (Light-weight approaches, no ML)
convert-with-ents
Light-weight tool for converting characters in a string into common HTML entities (without regex).
freefields-from-string
Code for extracting field-like text from unformatted strings
gs-scripts
Samples of .gs scripts
rr-search-tries
Trie-based search classes for JavaScript
CPP-samples
C++ samples
js-mnl-punct-norm
Light-weight tool for removing punctuation. Supports multiple natural languages. Useful for scrapping, machine learning, and data analysis.
js-mnl-ws-norm
Light-weight tool for normalizing whitespace and accurately tokenizing words. Multiple natural languages supported. Useful for scrapping, machine learning, and data analysis.
ko-ww-stopwords
Set of whole-word (independent) stop words in Korean
mnl-punct-norm
Light-weight tool for removing punctuation. Supports multiple natural languages. Useful for scrapping, machine learning, and data analysis.
mnl-ws-norm
Light-weight tool for normalizing whitespace and accurately tokenizing words (no regex). Multiple natural languages supported. Useful for scrapping, machine learning, and data analysis.
RairyeTrieSample
トライ木の実装のサンプル(オートコンプリート辞書)Sample implementation of trie (as auto-complete dictionary)
sentence-tk-checker
Checks output of an English sentence tokenizer and modifies the output according to default or user-defined rules.
st-no-love
Tool for escaping script tags using backslashes (no regex).
TranslationQATools
Java / Swing / Apache POI 翻訳の品質確保ツール
TwoLanguageFormOutputFromSingleLanguageInput
(React Native, JavaScript) 単数の言語の入力により、二つの言語でフォームを出力するためのアプリです。App for outputting forms in two languages from single-language user input.