Mimino666 / langdetect

Port of Google's language-detection library to Python.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

If sentence is all uppercase, it gives wrong results.

JaViLuMa opened this issue · comments

Hello. I had a task to detect languages for certain sentences.

Let's say we have this sentence:
ZANIMA ME CENA PREMIUM HIŠIC, BLIZU MORJA, IMAMO TUDI PSA. this is the output:

Screenshot_15

But if I convert it to sentence case (Zanima me cena hišic, blizu morja, imamo tudi psa.), output is MUCH different:

Screenshot_16

I know this issue is easy to fix, but I think this behavior is and was not intended.

Has anyone done anything better than: detect(TEXT_with_Capital_Letters.lower()) ?

I think it will almost never degrade accuracy if we make the string lower-case before feeding it into the algorithm.