training always jam at session.run()
scofield7419 opened this issue · comments
Hao Fei commented
I tried hundreds times of training the model with singleton.py, the processing always suspend at:
tf_loss, tf_global_step, _ = session.run([model.loss, model.global_step, model.train_op])
you see from the snapshot that the codes is ok to run but, is suspended at there.
After several debug, I will say the problem is located on the:
enqueue_thread = threading.Thread(target=_enqueue_loop)
since the process never get into the '_enqueue_loop()'.
Other word, the 'FIFOQueue' with 'thread' failed to schedule.
Help with this plz.
Hao Fei commented
Solved!
queue = tf.PaddingFIFOQueue(capacity=2, dtypes=dtypes, shapes=new_shapes)
very large capacity causes the bug...