jcarreira / cirrus-kv

High-performance key-value store

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Segfault in Throughput.cpp

TylerADavis opened this issue · comments

Throughput.cpp benchmark crashes to to a segfault when attempting the 10 MB put test.

The offending line is

test_throughput<10   * 1024 * 1024>(num_runs / 100);

GDB output:
screen shot 2017-07-10 at 12 35 15 pm

Not clear what the cause of this might be. Memory corruption might be a cause. Valgrind can help here (though the ibverbs stack emits a lot of false positives).

I'll give that a try. This error happens on TCP as well, so I'll try running it without the RDMA stack.

I ran with valgrind, and it seems that there were no memory issues until the segfault itself. The output I got is below. Is the warning about the SP changing indicative of a stack overflow? It does mention it is a possibility.

[tylerdavis@f1:/data/tyler/ddc]$ valgrind --leak-check=yes ./benchmarks/throughput
==12246== Memcheck, a memory error detector
==12246== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==12246== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==12246== Command: ./benchmarks/throughput
==12246== 
==12246== Warning: client switching stacks?  SP change: 0xfff0009b8 --> 0xffe600470
==12246==          to suppress, use: --max-stackframe=10487112 or greater
==12246== Invalid write of size 4
==12246==    at 0x405067: void test_throughput<10485760ul>(int) (throughput.cpp:41)
==12246==  Address 0xffe60047c is on thread 1's stack
==12246== 
==12246== 
==12246== Process terminating with default action of signal 11 (SIGSEGV)
==12246==  Access not within mapped region at address 0xFFE60047C
==12246==    at 0x405067: void test_throughput<10485760ul>(int) (throughput.cpp:41)
==12246==  If you believe this happened as a result of a stack
==12246==  overflow in your program's main thread (unlikely but
==12246==  possible), you can try to increase the size of the
==12246==  main thread stack using the --main-stacksize= flag.
==12246==  The main thread stack size used in this run was 8388608.
==12246== Invalid write of size 8
==12246==    at 0x4A28680: _vgnU_freeres (in /usr/lib/valgrind/vgpreload_core-amd64-linux.so)
==12246==  Address 0xffe600468 is on thread 1's stack
==12246== 
==12246== 
==12246== Process terminating with default action of signal 11 (SIGSEGV)
==12246==  Access not within mapped region at address 0xFFE600468
==12246==    at 0x4A28680: _vgnU_freeres (in /usr/lib/valgrind/vgpreload_core-amd64-linux.so)
==12246==  If you believe this happened as a result of a stack
==12246==  overflow in your program's main thread (unlikely but
==12246==  possible), you can try to increase the size of the
==12246==  main thread stack using the --main-stacksize= flag.
==12246==  The main thread stack size used in this run was 8388608.
==12246== 
==12246== HEAP SUMMARY:
==12246==     in use at exit: 72,704 bytes in 1 blocks
==12246==   total heap usage: 1 allocs, 0 frees, 72,704 bytes allocated
==12246== 
==12246== LEAK SUMMARY:
==12246==    definitely lost: 0 bytes in 0 blocks
==12246==    indirectly lost: 0 bytes in 0 blocks
==12246==      possibly lost: 0 bytes in 0 blocks
==12246==    still reachable: 72,704 bytes in 1 blocks
==12246==         suppressed: 0 bytes in 0 blocks
==12246== Reachable blocks (those to which a pointer was found) are not shown.
==12246== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==12246== 
==12246== For counts of detected and suppressed errors, rerun with: -v
==12246== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

I found that the segfault was resulting from an std::array that was too large for the stack. I've fixed it in #75