sourcegraph / conc

Better structured concurrency for go

Home Page:https://about.sourcegraph.com/blog/building-conc-better-structured-concurrency-for-go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

goroutine leak

miparnisari opened this issue · comments

I wrote a benchmark for pool and, unless my benchmark is wrong or my understanding of how the library works is wrong, there is a goroutine leak:

func BenchmarkPool(b *testing.B) {
	b.Run("without_error", func(b *testing.B) {
		fmt.Println("before", runtime.NumGoroutine())
		for i := 0; i < b.N; i++ {
			p := pool.New().WithMaxGoroutines(10)
			for j := 0; j < 1000; j++ {
				p.Go(func() {
					r := rand.Intn(10)
					time.Sleep(time.Duration(r) * time.Microsecond)
				})
			}

			p.Wait()
		}
		fmt.Println("after", runtime.NumGoroutine())
	})

If you run this: go test -v ./pool -run=XXX -bench=BenchmarkPool -count 10000

most of the times, I get

before 3
after 3
before 3
after 3
before 3
after 3

but I also saw

before 3
after 3
before 3
after 4  // <------- ?
before 3
after 3

I tested this on a Macbook Pro with Intel i5, it might be harder to reproduce on faster CPUs.

Thanks for the diligence! I expect this is simply a race condition between defer wg.Done() and the actual exit of the goroutine. Calling wg.Done() will wake up the waiting goroutine (the goroutine calling p.Wait()) immediately, so if you call runtime.NumGoroutine() between the time when wg.Done() is called and the goroutine exits, then it will appear there is a goroutine leak. I don't think there is really anything we can do about that because there is no such thing as a goroutine "handle", so a signal right before goroutine exit is the best we can do, and that's fundamentally racy.

I did manage to reproduce once on my machine, but using a modified version of your benchmark that re-measures a few times after a failure. In my reproduction, sleeping for 1us was enough for the stray goroutine to exit.

	b.Run("without_error", func(b *testing.B) {
		before := runtime.NumGoroutine()
		for i := 0; i < b.N; i++ {
			p := pool.New().WithMaxGoroutines(10)
			for j := 0; j < 1000; j++ {
				p.Go(func() {
					r := rand.Intn(10)
					time.Sleep(time.Duration(r) * time.Microsecond)
				})
			}

			p.Wait()
		}
		after := runtime.NumGoroutine()
		if after != before {
			time.Sleep(time.Microsecond)
			after2 := runtime.NumGoroutine()
			time.Sleep(time.Millisecond)
			after3 := runtime.NumGoroutine()
			time.Sleep(time.Second)
			after4 := runtime.NumGoroutine()
			// reproduction printed 3 4 3 3 3
			b.Fatalf("%d %d %d %d %d", before, after, after2, after3, after4)
		}
	})