abseil / abseil-cpp

Abseil Common Libraries (C++)

Home Page:https://abseil.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why do sleeping futex-based SpinLocks wake and spin again periodically?

kentonv opened this issue · comments

I'm trying to understand this code:

ABSL_ATTRIBUTE_WEAK void ABSL_INTERNAL_C_SYMBOL(AbslInternalSpinLockDelay)(
std::atomic<uint32_t> *w, uint32_t value, int loop,
absl::base_internal::SchedulingMode) {
absl::base_internal::ErrnoSaver errno_saver;
struct timespec tm;
tm.tv_sec = 0;
tm.tv_nsec = absl::base_internal::SpinLockSuggestedDelayNS(loop);
syscall(SYS_futex, w, FUTEX_WAIT | FUTEX_PRIVATE_FLAG, value, &tm);
}
ABSL_ATTRIBUTE_WEAK void ABSL_INTERNAL_C_SYMBOL(AbslInternalSpinLockWake)(
std::atomic<uint32_t> *w, bool all) {
syscall(SYS_futex, w, FUTEX_WAKE | FUTEX_PRIVATE_FLAG, all ? INT_MAX : 1, 0);
}

SpinLock uses futexes for the slow path on Linux. The thread calling FUTEX_WAIT apparently sets a random timeout on the wait. When either the futex is woken or the timeout expires, the thread will wake up and try to take the lock again.

But AFAICT the thread that holds the lock will reliably call FUTEX_WAKE when releasing it. So it seems like there is no reason for the waiting thread to set a timeout. It will always get an explicit wake if the lock is available. If it wakes from the timeout, it will presumably always find the lock is still held, in which case it will SpinLoop() (burning CPU) and then (if it's still locked) go back to sleep.

This code appears to be ancient. The same code can be found in gperftools, where it appears to date back to at least 2009 -- before that, the code didn't use futex at all and was just a sleep loop.

Is this timeout behavior vestigial and possibly no longer useful?

I came across this behavior while trying to debug a problem we see in production with tcmalloc: Occasionally, a server gets locked up with 1000+ threads all waiting on the global freelist's spinlock. It looks like this is not a deadlock situation, but instead a live lock. We suspect the spinlock behavior means that when a very large number of threads is waiting on the same spinlock, the CPU gets saturated with threads running SpinLoop() while the thread that actually holds the lock struggles to make progress. We're going to try ripping out all the spinning code and see if that improves anything, but I also want to understand if there's some rationale behind the approach here.

This does appear to be vestigial to me.

It looks like 44b0faf#diff-2d68645321260a286f9f39f0494e13bfcbb62b1aba31873946240c441222adda removed the possibility of a missed wakeup at the expense of a slightly slower Unlock path.

As far as I can tell what happened here is that spinlock_linux.inc is actually a forked copy of our production copy, but with the fiber scheduling code removed. We normally use a code stripping mechanism to avoid forks, but for some reason that didn't happen here. I'll see if the code can be merged and the timeout removed.

Note: This is issue has been tracked for a while under b/209611780 and was probably also reported to google/tcmalloc#111.

Oh yeah, google/tcmalloc#111 (comment) is in fact describing the same problem that I am suspecting here, but I didn't actually understand the comment when I first read it. Hah.

(And yes, that thread was started by someone on my team.)

Interestingly, the missed-wakeup fix you reference seems to have been fixed in gperftools much earlier: gperftools/gperftools@560ca86