pmem / pmemkv

Key/Value Datastore for Persistent Memory

Home Page:https://pmem.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

vcmap__concurrent_put_get_remove randomly fails

kilobyte opened this issue · comments

Environment Information

Name Version
pmemkv version(s) 1.5.0
libpmemobj-cpp version(s) 1.13.0
PMDK (libpmem/libpmemobj) version(s) 1.11.1
OS(es) version(s) Debian unstable
kernel version(s) 5.10
TBB version(s):
memkind version(s): 1.12.0
ndctl version(s): 71.1

Please provide a reproduction of the bug:

On the Reproducible Builds CI, this test fails pretty often. On my home box, it has never failed despite many many tries.

Link: https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/pmemkv.html (it's live thus it may fail to fail if the CI decides to run again).

-- Executing:  /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params vcmap;{"path":"/build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/test/vcmap__concurrent_put_get_remove_single_op_params__default_1000_0_none","size":125829120};1000
-- Test vcmap__concurrent_put_get_remove_single_op_params__default_1000_0_none:
-- Stdout:

Signal: Aborted, backtrace:

Signal: Aborted, backtrace:

Signal: Aborted, backtrace:

Signal: Aborted, backtrace:

Signal: Aborted, backtrace:
0: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (test_sighandler+0x23) [0x555c93c48633]

Signal: Aborted, backtrace:
0: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (test_sighandler+0x23) [0x555c93c48633]
0: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (test_sighandler+0x23) [0x555c93c48633]
0: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (test_sighandler+0x23) [0x555c93c48633]
0: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (test_sighandler+0x23) [0x555c93c48633]
0: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (test_sighandler+0x23) [0x555c93c48633]

Signal: Aborted, backtrace:
0: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (test_sighandler+0x23) [0x555c93c48633]
1: /lib/x86_64-linux-gnu/libc.so.6 (killpg+0x40) [0x7f18eed6c94f]
1: /lib/x86_64-linux-gnu/libc.so.6 (killpg+0x40) [0x7f18eed6c94f]
1: /lib/x86_64-linux-gnu/libc.so.6 (killpg+0x40) [0x7f18eed6c94f]
1: /lib/x86_64-linux-gnu/libc.so.6 (killpg+0x40) [0x7f18eed6c94f]
1: /lib/x86_64-linux-gnu/libc.so.6 (killpg+0x40) [0x7f18eed6c94f]
2: /lib/x86_64-linux-gnu/libc.so.6 (gsignal+0x141) [0x7f18eed6c891]
1: /lib/x86_64-linux-gnu/libc.so.6 (killpg+0x40) [0x7f18eed6c94f]
1: /lib/x86_64-linux-gnu/libc.so.6 (killpg+0x40) [0x7f18eed6c94f]
2: /lib/x86_64-linux-gnu/libc.so.6 (gsignal+0x141) [0x7f18eed6c891]
2: /lib/x86_64-linux-gnu/libc.so.6 (gsignal+0x141) [0x7f18eed6c891]
2: /lib/x86_64-linux-gnu/libc.so.6 (gsignal+0x141) [0x7f18eed6c891]
2: /lib/x86_64-linux-gnu/libc.so.6 (gsignal+0x141) [0x7f18eed6c891]
2: /lib/x86_64-linux-gnu/libc.so.6 (gsignal+0x141) [0x7f18eed6c891]
2: /lib/x86_64-linux-gnu/libc.so.6 (gsignal+0x141) [0x7f18eed6c891]
3: /lib/x86_64-linux-gnu/libc.so.6 (abort+0x112) [0x7f18eed56536]
4: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (UT_FATAL+0xc1) [0x555c93c44d81]
5: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZL25MultithreadedPutAndRemovemRN4pmem2kv2dbEEUlmE_mEEEEE6_M_runEv+0xbc) [0x555c93c44f4c]
6: /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (_ZNKSt10error_code23default_error_conditionEv+0x34) [0x7f18eefd3914]
7: /lib/x86_64-linux-gnu/libpthread.so.0 (start_thread+0xe0) [0x7f18ef1b8d80]
3: /lib/x86_64-linux-gnu/libc.so.6 (abort+0x112) [0x7f18eed56536]
4: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (UT_FATAL+0xc1) [0x555c93c44d81]
5: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZL25MultithreadedPutAndRemovemRN4pmem2kv2dbEEUlmE_mEEEEE6_M_runEv+0xbc) [0x555c93c44f4c]

Signal: Aborted, backtrace:
3: /lib/x86_64-linux-gnu/libc.so.6 (abort+0x112) [0x7f18eed56536]

Signal: Aborted, backtrace:
3: /lib/x86_64-linux-gnu/libc.so.6 (abort+0x112) [0x7f18eed56536]
0: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (test_sighandler+0x23) [0x555c93c48633]
0: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (test_sighandler+0x23) [0x555c93c48633]
4: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (UT_FATAL+0xc1) [0x555c93c44d81]
5: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZL25MultithreadedPutAndRemovemRN4pmem2kv2dbEEUlmE_mEEEEE6_M_runEv+0xbc) [0x555c93c44f4c]
1: /lib/x86_64-linux-gnu/libc.so.6 (killpg+0x40) [0x7f18eed6c94f]
4: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (UT_FATAL+0xc1) [0x555c93c44d81]
8: /lib/x86_64-linux-gnu/libc.so.6 (clone+0x3f) [0x7f18eee2cba8]

3: /lib/x86_64-linux-gnu/libc.so.6 (abort+0x112) [0x7f18eed56536]
3: /lib/x86_64-linux-gnu/libc.so.6 (abort+0x112) [0x7f18eed56536]
6: /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (_ZNKSt10error_code23default_error_conditionEv+0x34) [0x7f18eefd3914]
4: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (UT_FATAL+0xc1) [0x555c93c44d81]
5: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZL25MultithreadedPutAndRemovemRN4pmem2kv2dbEEUlmE_mEEEEE6_M_runEv+0xbc) [0x555c93c44f4c]
3: /lib/x86_64-linux-gnu/libc.so.6 (abort+0x112) [0x7f18eed56536]
4: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (UT_FATAL+0xc1) [0x555c93c44d81]
5: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZL25MultithreadedPutAndRemovemRN4pmem2kv2dbEEUlmE_mEEEEE6_M_runEv+0xbc) [0x555c93c44f4c]
5: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (_ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZL25MultithreadedPutAndRemovemRN4pmem2kv2dbEEUlmE_mEEEEE6_M_runEv+0xbc) [0x555c93c44f4c]
6: /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (_ZNKSt10error_code23default_error_conditionEv+0x34) [0x7f18eefd3914]

Signal: Aborted, backtrace:
0: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (test_sighandler+0x23) [0x555c93c48633]
4: /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params (UT_FATAL+0xc1) [0x555c93c44d81]
7: /lib/x86_64-linux-gnu/libpthread.so.0 (start_thread+0xe0) [0x7f18ef1b8d80]
8: /lib/x86_64-linux-gnu/libc.so.6 (clone+0x3f) [0x7f18eee2cba8]


-- Stderr:
./tests/engine_scenarios/concurrent/put_get_remove_single_op_params.cc:69 operator() - assertion failure: kv.put(uint64_to_strv(keys[thread_id]), uint64_to_strv(keys[thread_id])) (0x8) == status::OK (0x0), errormsg: [pmemkv_put] std::bad_alloc
./tests/engine_scenarios/concurrent/put_get_remove_single_op_params.cc:69 operator() - assertion failure: kv.put(uint64_to_strv(keys[thread_id]), uint64_to_strv(keys[thread_id])) (0x8) == status::OK (0x0), errormsg: [pmemkv_put] std::bad_alloc
./tests/engine_scenarios/concurrent/put_get_remove_single_op_params.cc:69 operator() - assertion failure: kv.put(uint64_to_strv(keys[thread_id]), uint64_to_strv(keys[thread_id])) (0x8) == status::OK (0x0), errormsg: [pmemkv_put] std::bad_alloc
./tests/engine_scenarios/concurrent/put_get_remove_single_op_params.cc:69 operator() - assertion failure: kv.put(uint64_to_strv(keys[thread_id]), uint64_to_strv(keys[thread_id])) (0x8) == status::OK (0x0), errormsg: [pmemkv_put] std::bad_alloc
./tests/engine_scenarios/concurrent/put_get_remove_single_op_params.cc:69 operator() - assertion failure: kv.put(uint64_to_strv(keys[thread_id]), uint64_to_strv(keys[thread_id])) (0x8) == status::OK (0x0), errormsg: [pmemkv_put] std::bad_alloc
./tests/engine_scenarios/concurrent/put_get_remove_single_op_params.cc:69 operator() - assertion failure: kv.put(uint64_to_strv(keys[thread_id]), uint64_to_strv(keys[thread_id])) (0x8) == status::OK (0x0), errormsg: [pmemkv_put] std::bad_alloc
./tests/engine_scenarios/concurrent/put_get_remove_single_op_params.cc:69 operator() - assertion failure: kv.put(uint64_to_strv(keys[thread_id]), uint64_to_strv(keys[thread_id])) (0x8) == status::OK (0x0), errormsg: [pmemkv_put] std::bad_alloc
./tests/engine_scenarios/concurrent/put_get_remove_single_op_params.cc:69 operator() - assertion failure: kv.put(uint64_to_strv(keys[thread_id]), uint64_to_strv(keys[thread_id])) (0x8) == status::OK (0x0), errormsg: [pmemkv_put] std::bad_alloc
./tests/engine_scenarios/concurrent/put_get_remove_single_op_params.cc:69 operator() - assertion failure: kv.put(uint64_to_strv(keys[thread_id]), uint64_to_strv(keys[thread_id])) (0x8) == status::OK (0x0), errormsg: [pmemkv_put] std::bad_alloc
./tests/engine_scenarios/concurrent/put_get_remove_single_op_params.cc:69 operator() - assertion failure: kv.put(uint64_to_strv(keys[thread_id]), uint64_to_strv(keys[thread_id])) (0x8) == status::OK (0x0), errormsg: [pmemkv_put] std::bad_alloc

CMake Error at /build/1st/pmemkv-1.5.0/tests/helpers.cmake:185 (message):
   /build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/tests/concurrent_put_get_remove_single_op_params vcmap;{"path":"/build/1st/pmemkv-1.5.0/obj-x86_64-linux-gnu/test/vcmap__concurrent_put_get_remove_single_op_params__default_1000_0_none","size":125829120};1000 failed: 134
Call Stack (most recent call first):
  /build/1st/pmemkv-1.5.0/tests/helpers.cmake:226 (execute_common)
  /build/1st/pmemkv-1.5.0/tests/engines/memkind_based/default.cmake:9 (execute)

@kilobyte, how many cores (as reported by nproc) and free memory are there on this machine?

Damned if I know... I can print that data, what commands would be useful in your opinion?

I believe nproc and free -g should be enough

@kilobyte bump - can you pls provide the data I asked for - that'd be appreciated 😃

can you also, please, update memkind to a more recent version (1.14.0 at best)

nproc is 15 is build1, 16 in build2; memory for both: 48G RAM + 200G swap; filesystem: tmpfs 24G
The results are random; currently both runs have succeeded.

I even see a comment on the RB page:

Comments: rpath issue fixed by -DCMAKE_BUILD_RPATH_USE_ORIGIN=ON, but unable to verify effect on test suite due to nondeterministic failures.

Thus others see random fails too.

... so I was about to report that the fail is no more, but it turns out all my recent tests have -E 'vcmap__concurrent_put_get_remove_.*' thus RB no longer failing is not as good as it seems.