baidu-research / persistent-rnn

Fast Recurrent Networks Library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

trouble with gradient_check

stoictraveler opened this issue · comments

I am using 980 Ti, Ubuntu 16.04 LTS, CUDA 7.5

Below is the gdb printout

Thread 1 "persistent-rnn-" received signal SIGBUS, Bus error.
__memmove_avx_unaligned ()
at ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:146
146 ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S: No such file or directory.
(gdb) bt
#0 __memmove_avx_unaligned ()

at ../sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S:146

#1 0x00007ffff532c83f in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#2 0x00007ffff53cf81e in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#3 0x00007ffff53cfa4b in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#4 0x00007ffff53d0b5f in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#5 0x00007ffff514e1a2 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#6 0x00007ffff514e8c5 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#7 0x00007ffff544cda2 in cuMemcpyAsync ()

from /usr/lib/x86_64-linux-gnu/libcuda.so.1
#8 0x00007ffff623a33d in ?? () from /usr/local/cuda/lib64/libcudart.so
#9 0x00007ffff621ccbb in ?? () from /usr/local/cuda/lib64/libcudart.so
#10 0x00007ffff624f628 in cudaMemcpyAsync ()

from /usr/local/cuda/lib64/libcudart.so
#11 0x00007ffff776ae7b in prnn::parallel::CudaRuntimeLibrary::cudaMemcpyAsync(void_, void const_, unsigned long, prnn::parallel::CudaRuntimeLibrary::cudaMemcpyKind, void*) ()

from /home/mewang/projects/persistent-rnn/.release_build/libprnn.so
#12 0x00007ffff7841f9b in void prnn::rnn::detail::dispatchForwardPropRecurrent<prnn::matrix::RectifiedLinear, prnn::rnn::RecurrentArchitectureParameters<float, prnn::rnn::TileConfig<1, 8, 8, 4, 4, 2, 4, 0, float> > >(prnn::rnn::RecurrentArchitectureParameters<float, prnn::rnn::TileConfig<1, 8, 8, 4, 4, 2, 4, 0, float> >::RealType_, prnn::rnn::RecurrentArchitectureParameters<float, prnn::rnn::TileConfig<1, 8, 8, 4, 4, 2, 4, 0, float> >::RealType const_, prnn::rnn::RecurrentArchitectureParameters<float, prnn::rnn::TileConfig<1, 8, 8, 4, 4, 2, 4, 0, float> >::RealType*, prnn::rnn::RecurrentArchitectureParameters<float, prnn::rnn::TileConfig<1, 8, 8, 4, 4, 2, 4, 0, float> > const&) ()

from /home/mewang/projects/persistent-rnn/.release_build/libprnn.so
#13 0x00007ffff7842f8c in void prnn::rnn::detail::forwardPropRecurrent<prnn::matrix::RectifiedLinear, prnn::matrix::SinglePrecision, (prnn::RecurrentLayerDirection)0>(prnn::matrix::DynamicView const&, prnn::matrix::ConstDynamicView const&, prnn::matrix::DynamicView const&, prnn::RecurrentOpsHandle const&, std::tupleprnn::matrix::SinglePrecision const&) ()

from /home/mewang/projects/persistent-rnn/.release_build/libprnn.so
#14 0x00007ffff7840062 in prnn::rnn::detail::forwardPropRecurrentOverActivationFunctions(prnn::matrix::DynamicView const&, prnn::matrix::ConstDynamicView const&, prnn::matrix::DynamicView const&, prnn::RecurrentOpsHandle const&) ()

from /home/mewang/projects/persistent-rnn/.release_build/libprnn.so
#15 0x00007ffff7752ecf in prnnRNNForward ()

from /home/mewang/projects/persistent-rnn/.release_build/libprnn.so
#16 0x0000000000407d8b in TestCForwardOps(Options const&) ()
#17 0x0000000000406ccb in RunTest(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, void (*)(Options const&), Options&)

()

#18 0x00000000004049e1 in main ()

Bus errors usually mean out of bounds accesses. Let me try to reproduce this...

@stoictraveler

The recently merged PR should address this issue. Please let me know if you still run into problems.