akimdi / langchecker

How often people forget to switch keyboard layout? Please, meet up with java solution like Punto Switcher.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LangChecker

  LangChecker is the implementation in Java programming language of well known approach to determine wrong keyboard layout. Supported languages are Russian and English. Used approach, called n-gram, based on vocabularies with nonexistent combination of letters. Algorithm works as good as carefully vocabularies were built (test results with accuracy of algorithm you can find below at Tests section).

LangChecker implemented as tokenizer. Why? because some letters in Russian layout are separators in English layout (for example: ыендубьгышс --> style,music, cj,snbt --> событие). LangChecker able to check not only single word, but phrase.

Implementation has dependency on Immutables.org.

Usage

Tokenizer tokenizer = LangSwitcherTokenizer.create();
System.out.println(tokenizer.tokenize("hello word руддщ цщкв"));
System.out.println(tokenizer.tokenize("примет мир ghbdtn vbh"));

Result of tokenize(String input) method is instance of TokenizerResponse. It contains original phrase, corrected phrase and list of tokens(parts of the phrase that recognized as words).

Tests

This test shows how good algorithm can detect wrong or correct words. Vocabularies with 109582 english and 92453 russian words were used for tests.

EN RU
positive 99.97% 99.99% amount of correct words, that were recognized as correct
false negative 0.03% 0.01% amount of correct words, thar were recognized as wrong
negative 98.26% 97.81% amount of wrong words, that were recognized as wrong
false positive 1.74% 2.19% amount of wrong words, that were recognized as correct

correct words - words from vocabulary, wrong words - words from vocabulary in wrong keyboard layout

Licence

Apache License, Version 2.0

About

How often people forget to switch keyboard layout? Please, meet up with java solution like Punto Switcher.


Languages

Language:Java 100.0%