Unicode Decode Error when running the LRS2 data preparation

Question

Unicode Decode Error when running the LRS2 data preparation

gak97 opened this issue a year ago · comments

Thank you for providing the training code for the Auto AVSR.

I am facing an issue when trying to run the preprocess_lrs2lrs3.py file using the LRS2 dataset. I am seeing the below error:

Traceback (most recent call last):
File "preprocess_lrs2lrs3.py", line 77, in
text_transform = TextTransform()
File "A:\Projects\auto_avsr\preparation\transforms.py", line 152, in init
units = open(dict_path).read().splitlines()
File "C:\Users\Girish\anaconda3\envs\autoavsr\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 4416: character maps to

Any help to resolve this would be greatly appreciated!

Pingchuan Ma · Answer 1 · Mon Jul 03 2023 21:39:20 GMT+0800 (China Standard Time)

Hi @gak97, for line 152 at file "A:\Projects\auto_avsr\preparation\transforms.py", can you please explicitly specify the encoding type for the file? Specifically, can you please try to change line 152 to units = open(dict_path, encoding='utf8').read().splitlines() to see if it works?

Girish Koushik · Answer 2 · Mon Jul 03 2023 23:04:45 GMT+0800 (China Standard Time)

Hi @mpc001, that resolved the error! Closing this issue here.