Encoding issue with Linux log
asrmnw opened this issue · comments
Not sure whether it's an issue from here. But when try to read the current Linux.log (zenodo, md5:6d1802d7778126f21c001c6aa7b6b106) with python i got
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 20: invalid start byte
can you confirm that or is that something probably going wrong on my side?
my fault. never mind
Sorry for the back and forth. After recognizing my fault, there is still the UnicodeDecodeError. This time:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf7 in position 4536: invalid start byte
This is one place i found when opening the original decompressed log file Linux.log with vim:
it requires me to use the errors=
option for pythons open
function to read the file without exception.
If you use raw logs from production, such errors are not uncommon. Please just skip such rows if they are not so many.