How to use conll2017 baseline ？

Question

How to use conll2017 baseline ？

zhou-zh opened this issue 7 years ago · comments

Thanks for your great works！I saw your reply on stackoverflow, i know you have built your own system, i have two problemsa about it:

You trianed your model on English, i also trian once. Official offer different language models for conll2017 baselines，i don't konw which are entries for modify scipt to train differnt language models?
Your eval scipt is well , but their README mentioned the baseline_eval.py can't find , do you know where is it?
I am sorry for that my problems maybe not directly related to your models. But those are really important for me，if you know please tell me , thanks very much.

Myungchul Shin · Answer 1 · Wed Apr 05 2017 09:04:33 GMT+0800 (China Standard Time)

hello~

i understand that you want to train other language model. if then, you can check this issue.

#21 (comment)
to train, basically, downloading UD corpuses required.
after downloading, modify train_dragnn.sh for a language and run the script.

SRC_CORPUS_DIR=${CDIR}/UD_English
TRAIN_FILE=${DATA_DIR}/en-ud-train.conllu.conv 
DEV_FILE=${DATA_DIR}/en-ud-dev.conllu.conv

you can check tensorflow/models#1211 (comment)

ZhihengZhou · Answer 2 · Sat Apr 08 2017 22:43:24 GMT+0800 (China Standard Time)

@dsindex , thanks for your reply !
If i hope change a language to trian, I should just modify the path to the data set for corresponding language ?
We do not need to use the different models provided by the CoNLL2017 baselines guide ?
I thought that different lanuage models have different word-map.

Myungchul Shin · Answer 3 · Mon Apr 10 2017 16:08:28 GMT+0800 (China Standard Time)

@continuesmile
yes~
place a corpus to the path and modify script for training your own model.

the models provided by the CoNLL2017 baselines guide were trained by https://github.com/tensorflow/models/tree/master/syntaxnet/dragnn/tools

those script are the original one. mine is modified version for convenience.

jhowliu · Answer 4 · Tue May 09 2017 10:02:40 GMT+0800 (China Standard Time)

Hi @dsindex,

Should I train the segmentation by myself ?
I trained the model with UD Chinese Corpus, but the UAS, LAS only 68.36%, 58.96%, much worse than baseline. Do you have some hint ?

Thanks again