Why do we need to verify runningWorkerCount when submitting a task?

Question

Why do we need to verify runningWorkerCount when submitting a task?

chengyayu opened this issue 4 years ago · comments

hi @alitto pond is great. Thank you for your effort. I am studying the code of pond trying to figure out its design structure，
the verify of runningWorkerCount > 0 && p.Idle() > 0 in the following piece of code causes my confusion.

func (p *WorkerPool) submit(task func(), waitForIdle bool) bool {
	if task == nil {
		return false
	}

	runningWorkerCount := p.Running()

	// Attempt to dispatch to an idle worker without blocking
	if runningWorkerCount > 0 && p.Idle() > 0 {
		select {
		case p.tasks <- task:
			return true
		default:
			// No idle worker available, continue
		}
	}

	maxWorkersReached := runningWorkerCount >= p.maxWorkers

	// Exit if we have reached the max. number of workers and can't wait for an idle worker
	if maxWorkersReached && !waitForIdle {
		return false
	}

	// Start a worker as long as we haven't reached the limit
	if ok := p.maybeStartWorker(task); ok {
		return true
	}

	// Submit the task to the tasks channel and wait for it to be picked up by a worker
	p.tasks <- task
	return true
}

What's the problem if we replace runningWorkerCount > 0 && p.Idle() > 0 with p.Idle() > 0?

Alejandro Durante · Answer 1 · Thu Nov 05 2020 05:23:13 GMT+0800 (China Standard Time)

Hey @chengyayu! thank you 🙂
We need to check the worker count using the workerCount counter because idleWorkerCount is not as reliable. If we only check whether p.Idle() > 0, then it could happen that we submit a task to the tasks channel right before this line https://github.com/alitto/pond/blob/master/pond.go#L331, where idleWorkerCount is decremented.
This can cause the task to wait in the tasks channel forever (if no other tasks are submitted afterwards).
workerCount, on the other hand, is updated in a synchronized fashion using a mutex, and that, along with checking the idle counter > 0, guarantees there is at least 1 running goroutine. This goroutine may or may not be really idle, but that's fine because we just need to make sure the task will eventually be picked up.

Yayu · Answer 2 · Fri Nov 06 2020 10:34:35 GMT+0800 (China Standard Time)

Hey @alitto thank you for your reply

We need to check the worker count using the workerCount counter because idleWorkerCount is not as reliable. If we only check whether p.Idle() > 0, then it could happen that we submit a task to the tasks channel right before this line https://github.com/alitto/pond/blob/master/pond.go#L331, where idleWorkerCount is decremented.

If we check runningWorkerCount>0 && p.Idle()>0, it could also happen that we submit a task to the tasks channel right before this line https://github.com/alitto/pond/blob/master/pond.go#L331, where idleWorkerCount is decremented. and at that moment, the runningWorkerCount may be 1, am i right ?

This can cause the task to wait in the tasks channel forever (if no other tasks are submitted afterwards). But i do not know how to prove it.