look's like langdetect is getting fooled by bytes
Fratso opened this issue · comments
Fratso commented
Hi,
I tried to use it as a plaintext detector, to check if it could detect an english sentance from a random deciphered string.
Here's an example:
>>> from langdetect import detect
>>> from langdetect import detect_langs
>>> deciphered_string = b'Q\x04RWUV\x04YTXS\x05RTTPU\x00QYPSURTYSTRW\x04\x05R\x05\x04WVRUQTXQQP\x04R\x07TRT\x02\x04WSVPQRS'
>>> deciphered_string.decode("utf-8")
'Q\x04RWUV\x04YTXS\x05RTTPU\x00QYPSURTYSTRW\x04\x05R\x05\x04WVRUQTXQQP\x04R\x07TRT\x02\x04WSVPQRS'
>>> detect_langs(deciphered_string.decode("utf-8"))
[en:0.999994546875217]
>>> detect(deciphered_string.decode("utf-8"))
'en'
I expected the function to throw an error but not to send a bad result.