trouble with gradient_check
stoictraveler opened this issue · comments
I am using 980 Ti, Ubuntu 16.04 LTS, CUDA 7.5
Below is the gdb printout
Thread 1 "persistent-rnn-" received signal SIGBUS, Bus error.
__memmove_avx_unaligned ()
at ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:146
146 ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S: No such file or directory.
(gdb) bt
#0 __memmove_avx_unaligned ()
at ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:146
#1 0x00007ffff532c83f in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff53cf81e in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff53cfa4b in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff53d0b5f in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#5 0x00007ffff514e1a2 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#6 0x00007ffff514e8c5 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#7 0x00007ffff544cda2 in cuMemcpyAsync ()
from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#8 0x00007ffff623a33d in ?? () from /usr/local/cuda/lib64/libcudart.so
#9 0x00007ffff621ccbb in ?? () from /usr/local/cuda/lib64/libcudart.so
#10 0x00007ffff624f628 in cudaMemcpyAsync ()
from /usr/local/cuda/lib64/libcudart.so
#11 0x00007ffff776ae7b in prnn::parallel::CudaRuntimeLibrary::cudaMemcpyAsync(void_, void const_, unsigned long, prnn::parallel::CudaRuntimeLibrary::cudaMemcpyKind, void*) ()
from /home/mewang/projects/persistent-rnn/.release_build/libprnn.so
#12 0x00007ffff7841f9b in void prnn::rnn::detail::dispatchForwardPropRecurrent<prnn::matrix::RectifiedLinear, prnn::rnn::RecurrentArchitectureParameters<float, prnn::rnn::TileConfig<1, 8, 8, 4, 4, 2, 4, 0, float> > >(prnn::rnn::RecurrentArchitectureParameters<float, prnn::rnn::TileConfig<1, 8, 8, 4, 4, 2, 4, 0, float> >::RealType_, prnn::rnn::RecurrentArchitectureParameters<float, prnn::rnn::TileConfig<1, 8, 8, 4, 4, 2, 4, 0, float> >::RealType const_, prnn::rnn::RecurrentArchitectureParameters<float, prnn::rnn::TileConfig<1, 8, 8, 4, 4, 2, 4, 0, float> >::RealType*, prnn::rnn::RecurrentArchitectureParameters<float, prnn::rnn::TileConfig<1, 8, 8, 4, 4, 2, 4, 0, float> > const&) ()
from /home/mewang/projects/persistent-rnn/.release_build/libprnn.so
#13 0x00007ffff7842f8c in void prnn::rnn::detail::forwardPropRecurrent<prnn::matrix::RectifiedLinear, prnn::matrix::SinglePrecision, (prnn::RecurrentLayerDirection)0>(prnn::matrix::DynamicView const&, prnn::matrix::ConstDynamicView const&, prnn::matrix::DynamicView const&, prnn::RecurrentOpsHandle const&, std::tupleprnn::matrix::SinglePrecision const&) ()
from /home/mewang/projects/persistent-rnn/.release_build/libprnn.so
#14 0x00007ffff7840062 in prnn::rnn::detail::forwardPropRecurrentOverActivationFunctions(prnn::matrix::DynamicView const&, prnn::matrix::ConstDynamicView const&, prnn::matrix::DynamicView const&, prnn::RecurrentOpsHandle const&) ()
from /home/mewang/projects/persistent-rnn/.release_build/libprnn.so
#15 0x00007ffff7752ecf in prnnRNNForward ()
from /home/mewang/projects/persistent-rnn/.release_build/libprnn.so
#16 0x0000000000407d8b in TestCForwardOps(Options const&) ()
#17 0x0000000000406ccb in RunTest(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, void (*)(Options const&), Options&)
()
#18 0x00000000004049e1 in main ()
Bus errors usually mean out of bounds accesses. Let me try to reproduce this...
The recently merged PR should address this issue. Please let me know if you still run into problems.