look's like langdetect is getting fooled by bytes

Question

look's like langdetect is getting fooled by bytes

Fratso opened this issue 3 years ago · comments

Hi,
I tried to use it as a plaintext detector, to check if it could detect an english sentance from a random deciphered string.

Here's an example:

>>> from langdetect import detect
>>> from langdetect import detect_langs

>>> deciphered_string = b'Q\x04RWUV\x04YTXS\x05RTTPU\x00QYPSURTYSTRW\x04\x05R\x05\x04WVRUQTXQQP\x04R\x07TRT\x02\x04WSVPQRS'
>>> deciphered_string.decode("utf-8")
'Q\x04RWUV\x04YTXS\x05RTTPU\x00QYPSURTYSTRW\x04\x05R\x05\x04WVRUQTXQQP\x04R\x07TRT\x02\x04WSVPQRS'

>>> detect_langs(deciphered_string.decode("utf-8"))
[en:0.999994546875217]
>>> detect(deciphered_string.decode("utf-8"))
'en'

I expected the function to throw an error but not to send a bad result.