Mimino666 / langdetect

Port of Google's language-detection library to Python.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

look's like langdetect is getting fooled by bytes

Fratso opened this issue · comments

Hi,
I tried to use it as a plaintext detector, to check if it could detect an english sentance from a random deciphered string.

Here's an example:

>>> from langdetect import detect
>>> from langdetect import detect_langs

>>> deciphered_string = b'Q\x04RWUV\x04YTXS\x05RTTPU\x00QYPSURTYSTRW\x04\x05R\x05\x04WVRUQTXQQP\x04R\x07TRT\x02\x04WSVPQRS'
>>> deciphered_string.decode("utf-8")
'Q\x04RWUV\x04YTXS\x05RTTPU\x00QYPSURTYSTRW\x04\x05R\x05\x04WVRUQTXQQP\x04R\x07TRT\x02\x04WSVPQRS'

>>> detect_langs(deciphered_string.decode("utf-8"))
[en:0.999994546875217]
>>> detect(deciphered_string.decode("utf-8"))
'en'

I expected the function to throw an error but not to send a bad result.