mpc001 / auto_avsr

Auto-AVSR: Lip-Reading Sentences Project

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unicode Decode Error when running the LRS2 data preparation

gak97 opened this issue · comments

Thank you for providing the training code for the Auto AVSR.

I am facing an issue when trying to run the preprocess_lrs2lrs3.py file using the LRS2 dataset. I am seeing the below error:

Traceback (most recent call last):
File "preprocess_lrs2lrs3.py", line 77, in
text_transform = TextTransform()
File "A:\Projects\auto_avsr\preparation\transforms.py", line 152, in init
units = open(dict_path).read().splitlines()
File "C:\Users\Girish\anaconda3\envs\autoavsr\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 4416: character maps to

Any help to resolve this would be greatly appreciated!

Hi @gak97, for line 152 at file "A:\Projects\auto_avsr\preparation\transforms.py", can you please explicitly specify the encoding type for the file? Specifically, can you please try to change line 152 to units = open(dict_path, encoding='utf8').read().splitlines() to see if it works?

Hi @mpc001, that resolved the error! Closing this issue here.