problem with training: less leaves in the beam search than requested

Question

problem with training: less leaves in the beam search than requested

jmokoistinen opened this issue 6 years ago · comments

python train.py with config3.json
...
INFO:tensorflow:Saving checkpoints for 1101 into ./model3/model.ckpt.
INFO - tensorflow - Saving checkpoints for 1101 into ./model3/model.ckpt.
INFO:tensorflow:global_step/sec: 1.98484
INFO - tensorflow - global_step/sec: 1.98484
INFO:tensorflow:loss = 0.43665585, step = 1100 (50.381 sec)
INFO - tensorflow - loss = 0.43665585, step = 1100 (50.381 sec)

Loss : [0.441212237]
...
Loss : [0.68871814]
2018-08-14 11:59:03.120680: W tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.
Loss : [inf]
Loss : [0.642776]
Loss : [0.515933156]
Loss : [0.659167171]
INFO:tensorflow:global_step/sec: 1.87844
INFO - tensorflow - global_step/sec: 1.87844
INFO:tensorflow:loss = 0.6591672, step = 1200 (53.237 sec)
INFO - tensorflow - loss = 0.6591672, step = 1200 (53.237 sec)
Loss : [1.09572434]
...
Loss : [0.784255]
2018-08-14 11:59:26.465221: W tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.
Loss : [inf]
Loss : [0.728220403]
...
Loss : [0.532087326]
2018-08-14 12:00:05.965566: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at ctc_decoder_ops.cc:322 : Invalid argument: Less leaves in the beam search than requested.
ERROR - CRNN_experiment - Failed after 0:11:56!
Traceback (most recent calls WITHOUT Sacred internals):
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(*args)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Less leaves in the beam search than requested.
[[Node: code2str_conversion/CTCBeamSearchDecoder = CTCBeamSearchDecoder[beam_width=100, merge_repeated=false, top_paths=2, _device="/job:localhost/replica:0/task:0/device:CPU:0"](deep_bidirectional_lstm/transpose_time_major/_667, code2str_conversion/Cast_1/_695)]]
[[Node: code2str_conversion/chars_conversion/cond/map/TensorArray_1/_731 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_3701_code2str_conversion/chars_conversion/cond/map/TensorArray_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent calls WITHOUT Sacred internals):
File "train_mk.py", line 118, in run
image_summaries=True))
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/estimator/estimator.py", line 366, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/estimator/estimator.py", line 1119, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/estimator/estimator.py", line 1135, in _train_model_default
saving_listeners)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/estimator/estimator.py", line 1336, in _train_with_estimator_spec
_, loss = mon_sess.run([estimator_spec.train_op, estimator_spec.loss])
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/training/monitored_session.py", line 577, in run
run_metadata=run_metadata)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/training/monitored_session.py", line 1053, in run
run_metadata=run_metadata)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/training/monitored_session.py", line 1144, in run
raise six.reraise(*original_exc_info)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/six.py", line 693, in reraise
raise value
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/training/monitored_session.py", line 1129, in run
return self._sess.run(*args, **kwargs)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/training/monitored_session.py", line 1201, in run
run_metadata=run_metadata)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/training/monitored_session.py", line 981, in run
return self._sess.run(*args, **kwargs)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Less leaves in the beam search than requested.
[[Node: code2str_conversion/CTCBeamSearchDecoder = CTCBeamSearchDecoder[beam_width=100, merge_repeated=false, top_paths=2, _device="/job:localhost/replica:0/task:0/device:CPU:0"](deep_bidirectional_lstm/transpose_time_major/_667, code2str_conversion/Cast_1/_695)]]
[[Node: code2str_conversion/chars_conversion/cond/map/TensorArray_1/_731 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_3701_code2str_conversion/chars_conversion/cond/map/TensorArray_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Caused by op 'code2str_conversion/CTCBeamSearchDecoder', defined at:
File "train_mk.py", line 50, in
training_params: dict, _config):
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/sacred-0.7.4-py3.6.egg/sacred/experiment.py", line 137, in automain
self.run_commandline()
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/sacred-0.7.4-py3.6.egg/sacred/experiment.py", line 260, in run_commandline
return self.run(cmd_name, config_updates, named_configs, {}, args)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/sacred-0.7.4-py3.6.egg/sacred/experiment.py", line 209, in run
run()
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/sacred-0.7.4-py3.6.egg/sacred/run.py", line 221, in call
self.result = self.main_function(*args)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/sacred-0.7.4-py3.6.egg/sacred/config/captured_function.py", line 46, in captured_function
result = wrapped(*args, **kwargs)
File "train_mk.py", line 118, in run
image_summaries=True))
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/estimator/estimator.py", line 366, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/estimator/estimator.py", line 1119, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/estimator/estimator.py", line 1132, in _train_model_default
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/estimator/estimator.py", line 1107, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/data/progs/tf-crnn-master/tf_crnn/model.py", line 331, in crnn_fn
top_paths=parameters.num_beam_paths)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/ops/ctc_ops.py", line 277, in ctc_beam_search_decoder
merge_repeated=merge_repeated))
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/ops/gen_ctc_ops.py", line 73, in ctc_beam_search_decoder
top_paths=top_paths, merge_repeated=merge_repeated, name=name)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/framework/ops.py", line 3414, in create_op
op_def=op_def)
File "/data/Mika/anaconda3/envs/reOCR/lib/python3.6/site-packages/tensorflow_gpu-1.9.0-py3.6-linux-x86_64.egg/tensorflow/python/framework/ops.py", line 1740, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Less leaves in the beam search than requested.
[[Node: code2str_conversion/CTCBeamSearchDecoder = CTCBeamSearchDecoder[beam_width=100, merge_repeated=false, top_paths=2, _device="/job:localhost/replica:0/task:0/device:CPU:0"](deep_bidirectional_lstm/transpose_time_major/_667, code2str_conversion/Cast_1/_695)]]
[[Node: code2str_conversion/chars_conversion/cond/map/TensorArray_1/_731 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_3701_code2str_conversion/chars_conversion/cond/map/TensorArray_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

solivr · Answer 1 · Wed Aug 15 2018 16:09:16 GMT+0800 (China Standard Time)

Hi,
This is an error due to the CTC decoder. There is an issue (#12) on this, can you check you data according to the suggested solution ?

MK · Answer 2 · Wed Aug 15 2018 20:43:58 GMT+0800 (China Standard Time)

Yes need to scale the images as proposed for example max 40 characters in a word image would need width 40x4 => 160: This also removes the ctc loss calculation (no valid path) problem.