TestMnistConv.test_conv produces wrong number
Aetf opened this issue · comments
While the CPU version produces consistent accuracy number after 50 iterations, our RPC version generates different number.
After fixes for GPU landed, the GPU version also has the same behavior.
Steps to reproduce
- launch executor
python test_mnist_tf.py TestMnistConv.test_conv
Expected result
Test passes
Actual
The generated accuracy doesn't equal to the one generated by CPU in TF.
Traceback (most recent call last):
File "test_mnist_tf.py", line 129, in test_conv
self.assertEquals(actual, expected)
AssertionError: 0.249 != 0.68349999
Attached log: test_conv.tar.gz
I have a different log. It is the segmentation fault.
test_conv.tar.gz
Perhaps, we have the different way to compile the source.
The log is not fully flushed when it crashes. You can run p logging::logger->flush()
after the crash in gdb to flush the log. I need to know the exact op kernel running while the crash happens.
Also this looks like a different issue. Please open an new issue.
The stack trace is identical to #14. Please use that issue to track the segfault problem.