Mimino666 / langdetect

Port of Google's language-detection library to Python.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inaccurate predictions for basic english words

grestonian opened this issue · comments

library is unable to detect language for basic english words and hence generates poor inaccurate results as depicted below.
detect("sunday") => 'id' | whereas clearly 'sunday' in indonesian is minggu
detect("monday") => 'tr' | whereas 'monday' in turkish is 'pazartesi'
and surprisingly, detect('pazartesi') => 'es'

Infact,
langdetect.deteect_langs("sunday") outputs confidences for 'tr' and 'id', and no mention of english whatsoever.
same goes for months, and other basic english words, eg
detect("good") => 'so

"son", "song",...