Shopify / gvltools

Set of GVL instrumentation tools

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Waiting threads underflow/miscount when disabled/reset

unflxw opened this issue · comments

commented

These are two issues that occur with the waiting threads counter as it's disabled or reset. I'm reporting them together as they seem similar, but feel free to split this into separate issues.

Reset while waiting

If the waiting threads counter is reset while a thread is in waiting, when that thread is resumed, the waiting thread counter will underflow.

Reproduction case
require "gvltools"

GVLTools::WaitingThreads.enable

def fibonacci(number)
  number <= 1 ? number : fibonacci(number - 1) + fibonacci(number - 2)
end

threads = 5.times.map do
  Thread.new do
    5.times do
      fibonacci(30)
    end
    puts "Waiting threads is #{GVLTools::WaitingThreads.count}; resetting..."
    GVLTools::WaitingThreads.reset
  end
end

threads.each(&:join)

I suppose this could be addressed by keeping the value at the counter at the time of reset and ignoring as many of the resume events that follow as the value that was kept.

Disable while waiting

If the instrumentation is disabled while a thread is in waiting, a resumed counter will be missing, and the counter's "baseline" will drift away from zero (if enabling while waiting, it will underflow)

Reproduction case
require "gvltools"

def fibonacci(number)
  number <= 1 ? number : fibonacci(number - 1) + fibonacci(number - 2)
end

2.times do
  puts "Enabling waiting threads"
  GVLTools::WaitingThreads.enable

  threads = 5.times.map do
    Thread.new do
      5.times do
        fibonacci(30)
      end

      if GVLTools::WaitingThreads.enabled?
        puts "Waiting threads is #{GVLTools::WaitingThreads.count}; disabling..."
        GVLTools::WaitingThreads.disable
      end
    end
  end

  threads.each(&:join)
end

puts
puts "Final waiting threads is #{GVLTools::WaitingThreads.count}"

This can be fixed in two ways, depending on which behaviour is deemed correct:

  • The counter should never change after the instrumentation is disabled: in this case, enabling the counter should always reset the counter, as the counter's previous value cannot be trusted to be zero.
  • Even after the instrumentation is disabled, the resumed events corresponding to previously instrumented ready events should still be accounted for: Always decrement the waiting threads counter, even if the instrumentation is disabled.

Thanks for the repro, I'll work on a fix tomorrow.

Reset while waiting

Hum, so I wasn't envisioning a reset happening while the counter is active. I'm not sure I understand the use case.

I have an idea how to support it, I'll open a PR and tag you on it to discuss it. It's not nearly perfect it still has a very small race condition windows, but might be fine depending on the use case.

I'll look at the second case once it's done.