MNIST training hangs in ApplyAdam kernel
Aetf opened this issue · comments
Aetf commented
This happens regardless the executor is using GPU or not.
Steps to reproduce
- run executor with
EXEC_SCHED_USE_GPU=1
orEXEC_SCHED_USE_GPU=0
- run test
pytest test_mnist_tf.py
Expected
Test passes
Actual
Executor blocks waiting for kernel to finish. In the mean time the GPU utilization is zero.
The block always happens in AdamApply
operation.
Logs:
GPU: exec.output.zip
CPU: exec.output.zip