JCoz sometimes reports strange delays in experiment results
AlexVanGogen opened this issue · comments
The total delay in experiment result can be obscure sometimes. For example, there can be non-zero delays on baseline; once there was a delay that was greater than experiment duration at all.
It looks like in that case no one signal that resets thread-local delays is handled by thread. Such thread might handle the last signal received during experiment for a very long time, so that even the next experiment has time to be prepared. This causes the next experiment to run with stale thread-local delays, which affects the global delay, and, if some thread yet has nullified local delay, then it will fall asleep in signal handler although it isn't supposed to.
I think the solution here is to use a thread barrier between signaled user threads and the agent thread running an experiment.
One simple thing we can do here is add an atomic that the agent thread initializes to 0 before signaling, has each user thread atomically increment before returning from a signal handler, and then waits on before returning from signal_user_threads
.