komodojp / tinyld

Simple and Performant Language detection library for NodeJS

Home Page:https://komodojp.github.io/tinyld/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

"hello" is 0.62% English

winrid opened this issue · comments

As title says, even with TinyLD Heavy, the word "hello" is only 0.62% English...

Yes and this is by design, because it use a statistical approach, it needs a certain amount of characters ~40 to work with. It cannot work with one or two word and never will be.
https://github.com/komodojp/tinyld/blob/develop/docs/faq.md#can-tinyld-identify-short-strings

I guess for the moment your best hope is that someone make some good AI for language detection.

I see. I figured you had compressed dictionaries of common words or some such... I will just do this myself. I need to determine language from just one word sometimes, as it's used to implement a language whitelist.