Unicode

Question

Unicode

mahsash opened this issue 7 years ago · comments

Hi.
I have some problems during training my own model on Persian dataset. It gave me error at the beginning of training phase. My dataset is in UTF-8 format. Does Glample support utf-8? If yes, what else can be the problem? My dataset is in CONLL2003 format.
The Error: "file loader.py", line 43, in update_tag_scheme
'Please check sentence %i:\n%s' % (i, s_str))
Exception: <exception str() failed>
"

Thanks

Dung Thai · Answer 1 · Wed Apr 12 2017 02:09:47 GMT+0800 (China Standard Time)

You might need to change the encoding scheme in loader.py from 'utf8' to your string encoding format e.g., I used 'latin-1' for Spanish and German.

Rabia-Noureen · Answer 2 · Mon Sep 04 2017 20:40:53 GMT+0800 (China Standard Time)

Hi Sir i am also having the same issue with English Data set. My data set stanfordSentimentTreebank is encoded in UTF-8 and i am using GoogleNews Pretrained Word embedding that is a .gz file....
https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit
Kindly guide me as i am stuck with this error.

Rabia-Noureen · Answer 3 · Mon Sep 04 2017 20:41:55 GMT+0800 (China Standard Time)

@dungtn can you please help me solving the issue?