data feeder error

Question

data feeder error

fazlekarim opened this issue 6 years ago · comments

after running the code for about 200 steps, I am running into the following error. I can't figure out why. I feel like it should be an easy fix.

self._session.run(self._enqueue_op, feed_dict=feed_dict)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.CancelledError: Enqueue operation was cancelled
[[Node: datafeeder/input_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](datafeeder/input_queue, _arg_datafeeder/inputs_0_1, _arg_datafeeder/input_lengths_0_0, _arg_datafeeder/mel_targets_0_3, _arg_datafeeder/linear_targets_0_2)]]

Caused by op 'datafeeder/input_queue_enqueue', defined at:
File "train.py", line 153, in
main()
File "train.py", line 149, in main
train(log_dir, args)
File "train.py", line 58, in train
feeder = DataFeeder(coord, input_path, hparams)
File "/home/fakarim/projects/gst-tacotron/datasets/datafeeder.py", line 46, in init
self._enqueue_op = queue.enqueue(self._placeholders)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/ops/data_flow_ops.py", line 327, in enqueue
self._queue_ref, vals, name=scope)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2777, in _queue_enqueue_v2
timeout_ms=timeout_ms, name=name)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

CancelledError (see above for traceback): Enqueue operation was cancelled
[[Node: datafeeder/input_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](datafeeder/input_queue, _arg_datafeeder/inputs_0_1, _arg_datafeeder/input_lengths_0_0, _arg_datafeeder/mel_targets_0_3, _arg_datafeeder/linear_targets_0_2)]]

fazlekarim · Answer 1 · Fri Jun 15 2018 01:19:48 GMT+0800 (China Standard Time)

I think I am running out of memory. What type of GPU and how many GPU are you using? Is there a memory leakage somewhere? It makes no sense why I run out of memory after around 200 steps.

fazlekarim · Answer 2 · Fri Jun 15 2018 01:58:22 GMT+0800 (China Standard Time)

fixed it. code is fine. i just over reacted

marymirzaei · Answer 3 · Fri Jun 15 2018 16:08:18 GMT+0800 (China Standard Time)

@fazlekarim
I am having a similar issue. How did you solve this?

Shan Yang · Answer 4 · Fri Jun 15 2018 16:24:02 GMT+0800 (China Standard Time)

@fazlekarim @lapwing It is an OOM error. Since there are some too long sentences, it may throw OOM error at some step. You can fix it by:

Recude batch_size or increase the reduce_factor. (Changing reduce factor will affect the performance.)
Remove those too long sentences. For example, remove all sentences which are longer than 1200 frames. This will decrease the data size a little, but I guess it will not attect the performance too much.

fazlekarim · Answer 5 · Fri Jun 15 2018 19:20:23 GMT+0800 (China Standard Time)

Do you have a script to remove sentences greater than 1200 frames?

Shan Yang · Answer 6 · Sat Jun 16 2018 12:17:54 GMT+0800 (China Standard Time)

@fazlekarim A simple way is to modify the data process script as attached.
blizzard2013.zip