increpare / utf-8-bug-report-for-opennmt

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how to repro bug

onmt_build_vocab -config .\config.yaml
onmt_train -config .\config.yaml

produces the error

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe3 in position 0: invalid continuation byte

changing build_vocab to have this line fixes it:

with open(save_path, "w",encoding="utf8") as fo:

About