[ocaml5-issue] Assertion failure `s->running` during parallel `STM` or `Lin` tests
jmid opened this issue · comments
Jan Midtgaard commented
The merge of #445 to main
triggered an assertion failure and abort on Linux trunk during STM Out_channel test parallel
:
https://github.com/ocaml-multicore/multicoretests/actions/runs/8441854686/job/23121952174
random seed: 115742799
generated error fail pass / total time test name
[ ] 0 0 0 0 / 1000 0.0s STM Out_channel test sequential
[ ] 0 0 0 0 / 1000 0.0s STM Out_channel test sequential (generating)
[✓] 1000 0 0 1000 / 1000 3.7s STM Out_channel test sequential
[02] file runtime/domain.c; line 326 ### Assertion failed: s->running
File "src/io/dune", line 40, characters 7-16:
40 | (name stm_tests)
^^^^^^^^^
(cd _build/default/src/io && ./stm_tests.exe --verbose)
Command got signal ABRT.
[ ] 0 0 0 0 / 1000 0.0s STM Out_channel test parallel
Jan Midtgaard commented
Saw this again in focused tests on #304: Linux 5.3.0+trunk debug - this time on STM Sys test parallel
https://github.com/ocaml-multicore/multicoretests/actions/runs/9131253128/job/25110039250?pr=304
Starting 6-th run
random seed: 357814880
generated error fail pass / total time test name
[ ] 0 0 0 0 / 1000 0.0s STM Sys test sequential
[ ] 0 0 0 0 / 1000 0.0s STM Sys test sequential (generating)
[✓] 1000 0 0 1000 / 1000 9.4s STM Sys test sequential
[ ] 0 0 0 0 / 2500 0.0s STM Sys test parallel
[02] file runtime/domain.c; line 325 ### Assertion failed: s->running
/usr/bin/bash: line 1: 1943510 Aborted (core dumped) ./focusedtest.exe -v
[ ] 559 0 0 559 / 2500 50.7s STM Sys test parallel
Jan Midtgaard commented
I just observed this locally, on Linux running 5.2.0, trying a run with an extreme space_overhead
o=20
and the debug runtime to see if it would reveal anything:
multicoretests$ OCAMLRUNPARAM="s=4096,o=20,v=0,V=1" dune build "@ci" -j1 --no-buffer --display=quiet --cache=disabled --error-reporting=twice --profile=debug-runtime src/
[...]
random seed: 446171203
generated error fail pass / total time test name
[ ] 1 0 0 1 / 1000 0.5s Lin In_channel test with Domain (shrinking: 11.0003)[01] file runtime/domain.c; line 336 ### Assertion failed: s->running
File "src/io/dune", line 21, characters 7-23:
21 | (name lin_tests_domain)
^^^^^^^^^^^^^^^^
Command got signal ABRT.