BlockUntil insufficient for testing Timers

Question

BlockUntil insufficient for testing Timers

jpittis opened this issue 4 months ago · comments

As per #82, BlockUntil blocks on the number of active waiters, and for Timers/Tickers this means construction. Stop removes the waiter, and Reset adds it again. In practice I've found BlockUntil is insufficient to test code using Timers and Tickers because it can't distinguish between before a call to Stop, and after a call to Reset.

I'm curious if maintainers would be interested in the proposed addition. I've been using this in my test code implemented as a wrapper around clockwork, but supporting it in the library itself would be preferable. Happy to contribute the change.

Real-ish World Example

Imagine you want to write a test for the following state machine:

// Sends an incrementing counter on the counters channel every 3 seconds.
// If a heartbeat isn't received within 5 seconds, exits with an error.
func runStateMachine(
	ctx context.Context,
	clock clockwork.Clock,
	counters chan int,
	heartbeats chan struct{},
) error {

	counterTimer := clock.NewTimer(3 * time.Second)
	defer counterTimer.Stop()

	heartbeatTimeout := clock.NewTimer(5 * time.Second)
	defer heartbeatTimeout.Stop()

	counter := 1

	for {
		select {
		case <-counterTimer.Chan():
			counterTimer = clock.NewTimer(3 * time.Second)
			counters <- counter
			counter++

		case <-heartbeats:
			heartbeatTimeout.Stop()
			heartbeatTimeout = clock.NewTimer(5 * time.Second)

		case <-heartbeatTimeout.Chan():
			return fmt.Errorf("Timeout at %d", clock.Now().Unix())

		case <-ctx.Done():
			return ctx.Err()
		}
	}
}

You'll end up running into a situation where there's no primitive in the clockwork library to ensure that Advance doesn't race with the resetting of a timer:

func TestRunStateMachine_SendingHeartbeatResetsTimeout(t *testing.T) {
	ctx, cancel := context.WithCancel(context.Background())
	defer cancel()
	fakeClock := clockwork.NewFakeClockAt(time.Unix(0, 0))
	counters := make(chan int)
	heartbeats := make(chan struct{})

	done := make(chan error, 1)
	go func() {
		err := runStateMachine(ctx, fakeClock, counters, heartbeats)
		done <- err
	}()

	// Ensure state machine timers are initialized at time zero before advancing.
	fakeClock.BlockUntil(2)
	fakeClock.Advance(3 * time.Second)
	counter := <-counters
	require.Equal(t, 1, counter)

	// Sending a heartbeat gives us another 5 seconds, allowing for another counter to be
	// produced at 6 seconds in.
	heartbeats <- struct{}{}
	log.Println("Trigger race by removing this print statement (at least on my machine)")

	// PROBLEM: What do we block on here to ensure that the heartbeat timer is reset at time 3
	// rather than time 6? We can't use BlockUntil(2) because the condition is already true
	// before heartbeatTimeout.Stop() is called.

	fakeClock.Advance(3 * time.Second)
	counter = <-counters
	require.Equal(t, 2, counter)

	fakeClock.Advance(2 * time.Second)
	err := <-done
	require.Error(t, err)
	require.Contains(t, err.Error(), "Timeout at 8")
}

Possible Solution: Adjust System Under Test

Sometimes it's possible to write your system under test such that this race is avoided. In the above example, we reset the counterTimer before sending on the counters channel, ensuring that the reset always happens before the reset. However, in-practice I've found situations where it's just not reasonable to contort your system under test to solve for this race condition. In the above example, the heartbeatTimeout reset is similar to a real world situation I ran into, and ultimately solved with the solution proposed below.

Proposed Solution: Add BlockUntilMonotonic or BlockUntilSeenSoFar

What I've found works quite well, is a BlockUntilMonotonic method on FakeClock which blocks on the monotonically increasing number of watchers the clock has ever seen.

In the example above, the usage would look like this:

heartbeats <- struct{}{}
fakeClock.BlockUntilMonotonic(4)
// Unblocks after the new Timer is created.

In-practice, I've found reasoning about the total number of watchers that have been created over the life of the test to be tedious, so I've created a helper method that avoids me having to count.

seen := fakeClock.CurrentMonotonic()
heartbeats <- struct{}{}
fakeClock.BlockUntilMonotonic(seen + 1)
// Unblocks after the new Timer is created.

Implementation

I suspect the implementation will be relatively simple: just keep track of the total number of waiters the FakeClock has ever seen, and use a similar blocker primitive to unblock when enough new waiters are seen.

Naming

I don't have strong feelings about the naming. So far I've liked BlockUntilSeenSoFar or BlockUntilMonotonic the most as far as naming goes.

Darren Jacques · Answer 1 · Mon Jun 03 2024 11:20:25 GMT+0800 (China Standard Time)

From my perspective, I'm not generally supportive of making BlockUntilContext even more complex. It's behavior is already kinda racey and hard to manage.