The test runner sometimes times out
joaodasilva opened this issue · comments
This doesn't happen very often but is very annoying in the GitHub workflows.
It's also very hard to reproduce locally, but happens after hundreds of runs like this:
$ while xvfb-run out-debug/windowjs tests/run_tests.js; do sleep 1; done
Super unclear why this happens though. It's likely a race or some indeterminism in the subprocess spawning code, or in child/parent event dispatching.
The parent process isn't receiving the "exit" event from the child process.
The event is posted to the TaskQueue, but apparently that task never runs. This might be a bug in glfwPostEmptyEvent, not waking up a thread that isn't waiting yet; needs further investigation but this is super hard to reproduce locally.
Unfortunately, this is triggering very often on the Linux builder on GitHub...
Most likely, this is a bug in Xvfb.
Traced down the bug to glfwPostEmptyEvent not waking up a blocked glfwWaitEvents; internally, the XSendEvent calls succeeded but the blocking select() on the ConnectionNumber() fd never returned (until the test times out).
The bug also never reproes with Xorg or Xquartz.
Going to try to run tests with another headless X server, like Xdummy.
This repros with Xdummy and Xquartz too, but at a lower frequency. So it's probably an issue with XSendEvent from a background thread instead.
It's likely glfw/glfw#1281.
Next attempt at a fix is to use a self-pipe to wake up select(), or use pselect and a signal to wake it up instead.