senarvi / theanolm

TheanoLM is a recurrent neural network language modeling tool implemented using Theano

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

scoring error

sameerkhurana10 opened this issue · comments

hi,

i am getting the following error while calculating ppl on the test set:

Mapped name None to device cuda: GeForce GTX TITAN Black (0000:03:00.0)
2018-04-09 11:44:37,415 exception_handler: An unexpected KeyError exception occurred: 'Unable to get link info (bad symbol table node signature)'
Traceback will be written to debug log (enable with --log-level debug).
srun: error: sls-titan-0: task 0: Exited with exit code 2
(theano-lm) sameerk@sls-415-1:/data/sls/qcri/asr/sameer_v1/asr/kaldi-forked/kaldi/egs/mit_qcri/s5_language_modeling/theanolm/recipes/arabic$ srun -p gpu --gres=gpu:1 theanolm score exp/blstm256_voc80k_blstm/nnlm.h5 data/rnnlm_data_all/test.dat --output perplexity --log-level debug
2018-04-09 12:38:40,288 get_default_device: Context None device="GeForce GTX TITAN Black" ID="0000:03:00.0"
2018-04-09 12:38:40,291 from_file: Reading vocabulary from network state.
/data/sls/u/sameerk/anaconda3/envs/theano-lm/lib/python3.5/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using cuDNN version 6021 on context None
Mapped name None to device cuda: GeForce GTX TITAN Black (0000:03:00.0)
2018-04-09 12:38:40,292 exception_handler: An unexpected KeyError exception occurred: 'Unable to get link info (bad symbol table node signature)'
Traceback will be written to debug log (enable with --log-level debug).
2018-04-09 12:38:40,293 exception_handler: Traceback:
2018-04-09 12:38:40,339 exception_handler: File "/data/sls/u/sameerk/anaconda3/envs/theano-lm/bin/theanolm", line 147, in <module>
    main()
2018-04-09 12:38:40,339 exception_handler: File "/data/sls/u/sameerk/anaconda3/envs/theano-lm/bin/theanolm", line 88, in main
    args.command_function(args)
2018-04-09 12:38:40,340 exception_handler: File "/data/sls/u/sameerk/anaconda3/envs/theano-lm/lib/python3.5/site-packages/theanolm/commands/score.py", line 114, in score
    default_device=default_device)
2018-04-09 12:38:40,340 exception_handler: File "/data/sls/u/sameerk/anaconda3/envs/theano-lm/lib/python3.5/site-packages/theanolm/network/network.py", line 280, in from_file
    vocabulary = Vocabulary.from_state(state)
2018-04-09 12:38:40,340 exception_handler: File "/data/sls/u/sameerk/anaconda3/envs/theano-lm/lib/python3.5/site-packages/theanolm/vocabulary/vocabulary.py", line 289, in from_state
    if 'words' not in h5_vocabulary:
2018-04-09 12:38:40,340 exception_handler: File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
2018-04-09 12:38:40,340 exception_handler: File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
2018-04-09 12:38:40,340 exception_handler: File "/data/sls/u/sameerk/anaconda3/envs/theano-lm/lib/python3.5/site-packages/h5py/_hl/group.py", line 319, in __contains__
    return self._e(name) in self.id
2018-04-09 12:38:40,340 exception_handler: File "h5py/h5g.pyx", line 441, in h5py.h5g.GroupID.__contains__
2018-04-09 12:38:40,340 exception_handler: File "h5py/h5g.pyx", line 442, in h5py.h5g.GroupID.__contains__
2018-04-09 12:38:40,341 exception_handler: File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
2018-04-09 12:38:40,341 exception_handler: File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
2018-04-09 12:38:40,341 exception_handler: File "h5py/h5g.pyx", line 511, in h5py.h5g._path_valid
2018-04-09 12:38:40,341 exception_handler: File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
2018-04-09 12:38:40,341 exception_handler: File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
2018-04-09 12:38:40,341 exception_handler: File "h5py/h5l.pyx", line 212, in h5py.h5l.LinkProxy.exists
srun: error: sls-titan-0: task 0: Exited with exit code 2

score command:

srun -p gpu --gres=gpu:1 theanolm score exp/blstm256_voc80k_blstm/nnlm.h5 data/rnnlm_data_all/test.dat --output perplexity --log-level debug

train command:

theanolm train exp/blstm256_voc80k_blstm/nnlm.h5 --training-set data/rnnlm_data_all/transcript.dat --vocabulary data/rnnlm_data_all/input_80000.vocab --vocabulary-format words --sequence-length 25 --batch-size 32 --optimization-method adagrad --stopping-criterion no-improvement --cost cross-entropy --learning-rate 1 --gradient-decay-rate 0.9 --numerical-stability-term 1e-6 --num-noise-samples 1 --noise-distribution unigram --noise-dampening 0.5 --validation-frequency 1 --patience 0 --min-epochs 1 --max-epochs 15 --random-seed 1 --log-level debug --log-interval 200 --gradient-normalization 5 --architecture ../architectures/word-blstm256.arch --validation-file data/rnnlm_data_all/dev.dat

just checking the size of the model:

161k. looks suspiciously small.

What does this error mean?

It seems that the model is corrupted. Looks like the HDF5 library throws a KeyError when trying to read the vocabulary from the model. So the problem is in training, not scoring. Is there something suspicious in the train log?

probably right. Other models are fine. I got bus error for this model. I think nothing to do with TheanoLM.