SymbioticLab / Salus

Fine-grained GPU sharing primitives

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TestMnistConv.test_softmax fails

Aetf opened this issue · comments

commented

TestMnistConv.test_softmax crashes in closeSession in tensorflow.

Steps to reproduce

  1. launch executor
  2. export TF_CPP_MIN_VLOG_LEVEL=0. This is important. With TF_CPP_MIN_VLOG_LEVEL=3 the test passes
  3. python test_mnist_tf.py TestMnistConv.test_softmax

Expected

Test passes

Actual

Multiple crashes in tensorflow.

  • Crashes in tensorflow when handling closeSession: throws std::system_error (Invalid argument)
  • Segfault during Process, probably due to garbage value in tagged_node passed in.
commented

Attached log. Shows a crash during closeSession
crash-close-sess.tar.gz

It is actually a crash when run the network on CPU in TF.

commented

Doing a bisect shows that the crash was introduced in SymbioticLab/tensorflow-salus@a927079b4. Fixed by removing those logs, which were used for debugging thread pool hanging issue, and is no longer needed.

Fixed in SymbioticLab/tensorflow-salus@94f320232