Inaccurate predictions for basic english words
grestonian opened this issue · comments
library is unable to detect language for basic english words and hence generates poor inaccurate results as depicted below.
detect("sunday")
=> 'id' | whereas clearly 'sunday' in indonesian is minggu
detect("monday")
=> 'tr' | whereas 'monday' in turkish is 'pazartesi'
and surprisingly, detect('pazartesi')
=> 'es'
Infact,
langdetect.deteect_langs("sunday")
outputs confidences for 'tr' and 'id', and no mention of english whatsoever.
same goes for months, and other basic english words, eg
detect("good")
=> 'so
"son", "song",...