dsindex / syntaxnet

reference code for syntaxnet

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cannot train POS on another corpus ...

elduba opened this issue · comments

Hi !
First of all, i would like to thank you for this great tool and convert file you provide, it works just great.

But i am facing some issues with the french corpua.

could you please correct / complete my understanding of the configuration activities required for training on a another copus ? :

  1. Create a new folder in work (in my example UD_French) with 3 files : *-ud-dev.conllu / *-ud-test.conllu / **-ud-train.conllu
  2. Add the context.pbtxt and update file location value + record-format to "french-text"
  3. Update train.sh with correct file location value
  4. Run train.sh

Than I am stuck with that error :

File "/home/baduel/models/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/external/tf/tensorflow/python/client/session.py", line 673, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InvalidArgumentError: indices[0] = -1 is not in [0, 1)
     [[Node: training/embedding_lookup_4 = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:@training/Diag"], validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](training/Diag, training/gold_actions)]]
Caused by op u'training/embedding_lookup_4', defined at:
  File "/home/baduel/models/syntaxnet/bazel-bin/syntaxnet/parser_trainer.runfiles/syntaxnet/parser_trainer.py", line 303, in <module>
    tf.app.run()

Any help would be life saving :)

Best regards
Edulba

@elduba
i noticed that there is no XPOS in UD_French corpus.

1   Les le  DET _   Definite=Def|Gender=Fem|Number=Plur 2   det _   _
2   commotions  commotion   NOUN    _   Gender=Fem|Number=Plur  5   nsubj   _   _

so, i modified convert.py script.

        if tokens[4] == '_' :
            tokens[4] = tokens[3] # there is no XPOS
        else :
            tokens[3] = tokens[4] # UPOS <- XPOS

it works :

...
I syntaxnet/reader_ops.cc:141] Starting epoch 1
INFO:tensorflow:Epochs: 1, num steps: 100, seconds elapsed: 1.44, avg cost: 2.20,
INFO:tensorflow:Epochs: 1, num steps: 200, seconds elapsed: 2.10, avg cost: 1.40,
INFO:tensorflow:Epochs: 1, num steps: 300, seconds elapsed: 2.76, avg cost: 0.99,
INFO:tensorflow:Epochs: 1, num steps: 400, seconds elapsed: 3.42, avg cost: 0.78,
INFO:tensorflow:Epochs: 1, num steps: 500, seconds elapsed: 4.07, avg cost: 0.69,
INFO:tensorflow:Epochs: 1, num steps: 600, seconds elapsed: 4.74, avg cost: 0.63,
INFO:tensorflow:Epochs: 1, num steps: 700, seconds elapsed: 5.38, avg cost: 0.54,
....

Thanks ! It works very fine
👍