flame / blis

BLAS-like Library Instantiation Software Framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

thread barriers need backoff

jeffhammond opened this issue · comments

This code leads to serious problems when hardware threads are oversubscribed. Adding sched_yield() reduces the problem by one or two orders of magnitude by triggering the kernel to swap threads so progress happens more quickly.

// If the current thread is NOT the last thread to have arrived, then
// it spins on the sense variable until that sense variable changes at
// which time these threads will exit the barrier.
while ( __atomic_load_n( &comm->barrier_sense, __ATOMIC_ACQUIRE ) == orig_sense )
     ; // Empty loop body.

I tried no-op instructions but those do not help in the oversubscribed case, because they don't trigger a context switch. Those backoffs are appropriate when memory access contention is the issue.

This is related but complementary to #603. This is a new version of #82.

My proposed fix will allow a user to disable sched_yield() but I assert we need it enabled in the distribution builds of BLIS because quality-of-service is more important than the last bit of performance in the general case. Benchmarking use cases can disable it if it is expected to matter there.

References

@jeffhammond sched_yield is too heavyweight and not portable enough to be used all the time. I will update #82 with a general framework for config-specific behavior and then we can start filling in the actual implementation.

@jeffhammond suggestions for any specific architectures?