krasserm / perceiver-io

A PyTorch implementation of Perceiver, Perceiver IO and Perceiver AR with PyTorch Lightning scripts for distributed training

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

text encoding error

batrlatom opened this issue · comments

Hi,
I am getting this error

Traceback (most recent call last):
  File "train/train_mlm.py", line 113, in <module>
    main(parser.parse_args())
  File "train/train_mlm.py", line 69, in main
    data_module.setup()
  File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/core/datamodule.py", line 428, in wrapped_fn
    fn(*args, **kwargs)
  File "/opt/perceiver-io/data/imdb.py", line 131, in setup
    self.ds_train = IMDBDataset(root=self.root, split='train')
  File "/opt/perceiver-io/data/imdb.py", line 42, in __init__
    self.raw_x, self.raw_y = load_split(root, split)
  File "/opt/perceiver-io/data/imdb.py", line 34, in load_split
    raw_x.append(f.read())
  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 449: ordinal not in range(128)

it is probably related to the unicode encoding

@batrlatom thanks for reporting, this should be fixed now. Please re-open this ticket if the problem persists.

It works now. Thanks