Segfault in Throughput.cpp
TylerADavis opened this issue · comments
Not clear what the cause of this might be. Memory corruption might be a cause. Valgrind can help here (though the ibverbs stack emits a lot of false positives).
I'll give that a try. This error happens on TCP as well, so I'll try running it without the RDMA stack.
I ran with valgrind, and it seems that there were no memory issues until the segfault itself. The output I got is below. Is the warning about the SP changing indicative of a stack overflow? It does mention it is a possibility.
[tylerdavis@f1:/data/tyler/ddc]$ valgrind --leak-check=yes ./benchmarks/throughput
==12246== Memcheck, a memory error detector
==12246== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==12246== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==12246== Command: ./benchmarks/throughput
==12246==
==12246== Warning: client switching stacks? SP change: 0xfff0009b8 --> 0xffe600470
==12246== to suppress, use: --max-stackframe=10487112 or greater
==12246== Invalid write of size 4
==12246== at 0x405067: void test_throughput<10485760ul>(int) (throughput.cpp:41)
==12246== Address 0xffe60047c is on thread 1's stack
==12246==
==12246==
==12246== Process terminating with default action of signal 11 (SIGSEGV)
==12246== Access not within mapped region at address 0xFFE60047C
==12246== at 0x405067: void test_throughput<10485760ul>(int) (throughput.cpp:41)
==12246== If you believe this happened as a result of a stack
==12246== overflow in your program's main thread (unlikely but
==12246== possible), you can try to increase the size of the
==12246== main thread stack using the --main-stacksize= flag.
==12246== The main thread stack size used in this run was 8388608.
==12246== Invalid write of size 8
==12246== at 0x4A28680: _vgnU_freeres (in /usr/lib/valgrind/vgpreload_core-amd64-linux.so)
==12246== Address 0xffe600468 is on thread 1's stack
==12246==
==12246==
==12246== Process terminating with default action of signal 11 (SIGSEGV)
==12246== Access not within mapped region at address 0xFFE600468
==12246== at 0x4A28680: _vgnU_freeres (in /usr/lib/valgrind/vgpreload_core-amd64-linux.so)
==12246== If you believe this happened as a result of a stack
==12246== overflow in your program's main thread (unlikely but
==12246== possible), you can try to increase the size of the
==12246== main thread stack using the --main-stacksize= flag.
==12246== The main thread stack size used in this run was 8388608.
==12246==
==12246== HEAP SUMMARY:
==12246== in use at exit: 72,704 bytes in 1 blocks
==12246== total heap usage: 1 allocs, 0 frees, 72,704 bytes allocated
==12246==
==12246== LEAK SUMMARY:
==12246== definitely lost: 0 bytes in 0 blocks
==12246== indirectly lost: 0 bytes in 0 blocks
==12246== possibly lost: 0 bytes in 0 blocks
==12246== still reachable: 72,704 bytes in 1 blocks
==12246== suppressed: 0 bytes in 0 blocks
==12246== Reachable blocks (those to which a pointer was found) are not shown.
==12246== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==12246==
==12246== For counts of detected and suppressed errors, rerun with: -v
==12246== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)
I found that the segfault was resulting from an std::array that was too large for the stack. I've fixed it in #75