If sentence is all uppercase, it gives wrong results.

Question

JaViLuMa opened this issue 4 years ago · comments

Hello. I had a task to detect languages for certain sentences.

Let's say we have this sentence:
ZANIMA ME CENA PREMIUM HIŠIC, BLIZU MORJA, IMAMO TUDI PSA. this is the output:

But if I convert it to sentence case (Zanima me cena hišic, blizu morja, imamo tudi psa.), output is MUCH different:

I know this issue is easy to fix, but I think this behavior is and was not intended.

Mohammad Ali Dastgheib · Answer 1 · Mon May 10 2021 21:44:03 GMT+0800 (China Standard Time)

Has anyone done anything better than: detect(TEXT_with_Capital_Letters.lower()) ?

I think it will almost never degrade accuracy if we make the string lower-case before feeding it into the algorithm.