ecmwf / eckit

A C++ toolkit that supports development of tools and applications at ECMWF.

Home Page:https://confluence.ecmwf.int/display/eckit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Intermittant failures in eckit_test_container_sharedmemarray

DJDavies2 opened this issue · comments

I am getting intermittant failures in the test eckit_test_container_sharedmemarray:

4/140 Test #3: eckit_test_container_sharedmemarray ........Subprocess aborted***Exception: 0.52 sec
Running 4 tests:
Running case 0: test_eckit_sharedmemarray_construction ...
semget failed: No space left on device
�[31mTest "test_eckit_sharedmemarray_construction" failed with unhandled eckit::Exception: Failed system call: semget (Invalid argument) @ �[0m
Stack trace: backtrace [1] stack has 15 addresses
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/lib/libeckit.so+eckit::BackTrace::dumpabi:cxx11)0x48
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/lib/libeckit.so+eckit::Exception::Exception())0x93
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/lib/libeckit.so+eckit::FailedSystemCall::FailedSystemCall(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&))0x29
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/lib/libeckit.so+eckit::Semaphore::Semaphore(eckit::PathName const&, int))0x2ef
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/usr/lib64/libc.so.6+__libc_start_main)0xf5
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)

end of backtrace dump ...
Completed case 0: test_eckit_sharedmemarray_construction
Running case 1: test_eckit_sharedmemarray_checkvalues ...
semget failed: No space left on device
�[31mTest "test_eckit_sharedmemarray_checkvalues" failed with unhandled eckit::Exception: Failed system call: semget (No space left on device) @ �[0m
Stack trace: backtrace [2] stack has 15 addresses
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/lib/libeckit.so+eckit::BackTrace::dumpabi:cxx11)0x48
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/lib/libeckit.so+eckit::Exception::Exception())0x93
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/lib/libeckit.so+eckit::FailedSystemCall::FailedSystemCall(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&))0x29
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/lib/libeckit.so+eckit::Semaphore::Semaphore(eckit::PathName const&, int))0x2ef
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/usr/lib64/libc.so.6+__libc_start_main)0xf5
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)

end of backtrace dump ...
Completed case 1: test_eckit_sharedmemarray_checkvalues
Running case 2: test_eckit_sharedmemarray_add_more ...
semget failed: No space left on device
�[31mTest "test_eckit_sharedmemarray_add_more" failed with unhandled eckit::Exception: Failed system call: semget (No space left on device) @ �[0m
Stack trace: backtrace [3] stack has 15 addresses
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/lib/libeckit.so+eckit::BackTrace::dumpabi:cxx11)0x48
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/lib/libeckit.so+eckit::Exception::Exception())0x93
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/lib/libeckit.so+eckit::FailedSystemCall::FailedSystemCall(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&))0x29
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/lib/libeckit.so+eckit::Semaphore::Semaphore(eckit::PathName const&, int))0x2ef
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/usr/lib64/libc.so.6+__libc_start_main)0xf5
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)

end of backtrace dump ...
Completed case 2: test_eckit_sharedmemarray_add_more
Running case 3: test_eckit_sharedmemarray_checkvalues_2 ...
semget failed: No space left on device
�[31mTest "test_eckit_sharedmemarray_checkvalues_2" failed with unhandled eckit::Exception: Failed system call: semget (No space left on device) @ �[0m
Stack trace: backtrace [4] stack has 15 addresses
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/lib/libeckit.so+eckit::BackTrace::dumpabi:cxx11)0x48
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/lib/libeckit.so+eckit::Exception::Exception())0x93
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/lib/libeckit.so+eckit::FailedSystemCall::FailedSystemCall(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&))0x29
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/lib/libeckit.so+eckit::Semaphore::Semaphore(eckit::PathName const&, int))0x2ef
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)
(/usr/lib64/libc.so.6+__libc_start_main)0xf5
(/home/h01/frwd/cylc-run/EckitFailure/share/build-mo-spice_gnu_debug/eckit/tests/container/eckit_test_container_sharedmemarray)

end of backtrace dump ...
Completed case 3: test_eckit_sharedmemarray_checkvalues_2
FAILED: test_eckit_sharedmemarray_construction
FAILED: test_eckit_sharedmemarray_checkvalues
FAILED: test_eckit_sharedmemarray_add_more
FAILED: test_eckit_sharedmemarray_checkvalues_2
4 tests failed out of 4.
terminate called after throwing an instance of 'eckit::FailedSystemCall'
what(): Failed system call: ::shm_unlink("/baz_hosts") in (/home/h01/frwd/cylc-run/EckitFailure/share/mo-bundle/eckit/tests/container/test_sharedmemarray.cc +107 main) (No such file or directory)

It happens on different platforms.

Does anyone have any ideas for this? I have googled it but the suggestions seem to involve kernel settings that require root access that I don't have.

Does anyone have any ideas for this? I have googled it but the suggestions seem to involve kernel settings that require root access that I don't have.

@tlmquintino would you know?

There is a limit on the number of semaphores one can allocate in user land.
The semaphore limit is probably reached on that system, for the user running the tests. This may happen for multiple reasons, including previous tests failing before cleaning the semaphores.
These tests need a few semaphores to work, so one way is for the user to cleanup its semaphores before starting the tests (this has to be done outside the ctest environment).
In Linux, this can be done with the command ipcrm

something like:
for sem in $(ipcs -s | awk '{print $2}'); do ipcrm -s $sem; done

Thanks, I've arranged for this to be addressed in the test environment.