UTF8 '…' is misdetected
g2p opened this issue · comments
b'\xe2\x80\xa6' is UTF8 '…'. chardet mistakes it for Big5 and fails with a UnicodeDecodeError, charade takes it to be ISO-8859-2 which is broken but not detected by the Python codec.
NO LONGER MAINTAINED. USE chardet/chardet. Fork of chardet to support Python 2 and 3 in one code base.
g2p opened this issue · comments
b'\xe2\x80\xa6' is UTF8 '…'. chardet mistakes it for Big5 and fails with a UnicodeDecodeError, charade takes it to be ISO-8859-2 which is broken but not detected by the Python codec.