glample / tagger

Named Entity Recognition Tool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unicode

mahsash opened this issue · comments

commented

Hi.
I have some problems during training my own model on Persian dataset. It gave me error at the beginning of training phase. My dataset is in UTF-8 format. Does Glample support utf-8? If yes, what else can be the problem? My dataset is in CONLL2003 format.
The Error: "file loader.py", line 43, in update_tag_scheme
'Please check sentence %i:\n%s' % (i, s_str))
Exception: <exception str() failed>
"

Thanks

You might need to change the encoding scheme in loader.py from 'utf8' to your string encoding format e.g., I used 'latin-1' for Spanish and German.

Hi Sir i am also having the same issue with English Data set. My data set stanfordSentimentTreebank is encoded in UTF-8 and i am using GoogleNews Pretrained Word embedding that is a .gz file....
https://drive.google.com/file/d/0B7XkCwpI5KDYNlNUTTlSS21pQmM/edit
Kindly guide me as i am stuck with this error.

@dungtn can you please help me solving the issue?