training always jam at session.run()

Question

training always jam at session.run()

scofield7419 opened this issue 5 years ago · comments

I tried hundreds times of training the model with singleton.py, the processing always suspend at:
tf_loss, tf_global_step, _ = session.run([model.loss, model.global_step, model.train_op])

you see from the snapshot that the codes is ok to run but, is suspended at there.

After several debug, I will say the problem is located on the:
enqueue_thread = threading.Thread(target=_enqueue_loop)
since the process never get into the '_enqueue_loop()'.

Other word, the 'FIFOQueue' with 'thread' failed to schedule.

Help with this plz.

Hao Fei · Answer 1 · Sun Jun 30 2019 22:39:24 GMT+0800 (China Standard Time)

Solved!
queue = tf.PaddingFIFOQueue(capacity=2, dtypes=dtypes, shapes=new_shapes)
very large capacity causes the bug...