ocaml-multicore / multicoretests

PBT testsuite and libraries for testing multicore OCaml

Home Page:https://ocaml-multicore.github.io/multicoretests/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[ocaml5-issue] Assertion failure `s->running` during parallel `STM` or `Lin` tests

jmid opened this issue · comments

The merge of #445 to main triggered an assertion failure and abort on Linux trunk during STM Out_channel test parallel:
https://github.com/ocaml-multicore/multicoretests/actions/runs/8441854686/job/23121952174

random seed: 115742799
generated error fail pass / total     time test name

[ ]    0    0    0    0 / 1000     0.0s STM Out_channel test sequential
[ ]    0    0    0    0 / 1000     0.0s STM Out_channel test sequential (generating)
[✓] 1000    0    0 1000 / 1000     3.7s STM Out_channel test sequential

[02] file runtime/domain.c; line 326 ### Assertion failed: s->running
File "src/io/dune", line 40, characters 7-16:
40 |  (name stm_tests)
            ^^^^^^^^^
(cd _build/default/src/io && ./stm_tests.exe --verbose)
Command got signal ABRT.
[ ]    0    0    0    0 / 1000     0.0s STM Out_channel test parallel

Saw this again in focused tests on #304: Linux 5.3.0+trunk debug - this time on STM Sys test parallel
https://github.com/ocaml-multicore/multicoretests/actions/runs/9131253128/job/25110039250?pr=304

Starting 6-th run

random seed: 357814880
generated error fail pass / total     time test name

[ ]    0    0    0    0 / 1000     0.0s STM Sys test sequential
[ ]    0    0    0    0 / 1000     0.0s STM Sys test sequential (generating)
[✓] 1000    0    0 1000 / 1000     9.4s STM Sys test sequential

[ ]    0    0    0    0 / 2500     0.0s STM Sys test parallel
[02] file runtime/domain.c; line 325 ### Assertion failed: s->running
/usr/bin/bash: line 1: 1943510 Aborted                 (core dumped) ./focusedtest.exe -v
[ ]  559    0    0  559 / 2500    50.7s STM Sys test parallel

I just observed this locally, on Linux running 5.2.0, trying a run with an extreme space_overhead o=20 and the debug runtime to see if it would reveal anything:

multicoretests$ OCAMLRUNPARAM="s=4096,o=20,v=0,V=1" dune build "@ci" -j1 --no-buffer --display=quiet --cache=disabled --error-reporting=twice --profile=debug-runtime src/
[...]
random seed: 446171203
generated error fail pass / total     time test name
[ ]    1    0    0    1 / 1000     0.5s Lin In_channel test with Domain (shrinking:   11.0003)[01] file runtime/domain.c; line 336 ### Assertion failed: s->running
File "src/io/dune", line 21, characters 7-23:
21 |  (name lin_tests_domain)
            ^^^^^^^^^^^^^^^^
Command got signal ABRT.