jackc / pgx

PostgreSQL driver and toolkit for Go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ever Increasing number of Go routines with Pgx v5.5.5 and CockroachDB

victor-ferrer-form3 opened this issue · comments

Describe the bug

Recently we upgraded our software to pgx v5.5.5 and inmediately noticed that the number of Go routines our pods use is ever increasing.
This is a screenshot of our monitoring tool, depicting the go_goroutinesmetric:

image

The point where it starts to crawl up, matches our update to v5.5.2 to v5.5.5.

If we enable Pprof after the pod has spent some hours running we can see this:

goroutine profile: total 318
[...]
74 @ 0x43e32e 0x4099ad 0x4095b2 0x9529ec 0x471501
#	0x9529eb	github.com/jackc/pgx/v5/pgconn/internal/ctxwatch.(*ContextWatcher).Watch.func1+0x8b	/build/vendor/github.com/jackc/pgx/v5/pgconn/internal/ctxwatch/context_watcher.go:51

To Reproduce
Steps to reproduce the behavior:

If possible, please provide runnable example such as:

package main

import (
	"context"
	"log"
	"os"

	"github.com/jackc/pgx/v5"
)

func main() {
	conn, err := pgx.Connect(context.Background(), os.Getenv("DATABASE_URL"))
	if err != nil {
		log.Fatal(err)
	}
	defer conn.Close(context.Background())

	// Your code here...
}

Please run your example with the race detector enabled. For example, go run -race main.go or go test -race.

Expected behavior
We would expect those watches to be cancelled/closed and the Go Routines ended.

Actual behavior
Context watchers seem not to be finished properly.

Version

  • Go: $ go version -> 1.22
  • Cockroach DB:
cockroach version details:
Build Tag:        v23.1.12
Build Time:       2023/11/09 06:15:38
Distribution:     CCL
Platform:         linux amd64 (x86_64-pc-linux-gnu)
Go Version:       go1.19.13
C Compiler:       gcc 6.5.0
Build Commit ID:  d7e9824b4cd6ebf7a8548156f2a772ae6648257d
Build Type:       release
Enabled Assertions: false
(use 'cockroach version --build-tag' to display only the build tag)

  • pgx: 5.5.5

Note: Reverting to v5.5.2 solves the issue.

Please provide more details. I don't think this happens for every query/use case. For example, I can't replicate it using pgx in database/sql mode (tested both Postgres and CockroachDB).

Probably a Unwatch call is missing in some edge case in the code changed between 5.5.2 and 5.5.5

Hello @drakkan!

One thing that we have noticed is that the only service in which we see this behavior is one that uses Batch statements.
We have noticed this commit, introduced as part of the release 5.5.4 that has several changes related to Batches, although I am not sure if this causes the issue.

To try to narrow the problem down, we are going to repeat our tests with pgx v.5.5.3 and let you know of the results.

Update: pgx v.5.5.3 does not have this problem.

@victor-ferrer-form3 : have you tried with pgx v5.5.4, or had any success bisecting the problem?

Hi @sean-,
Yes v.5.5.4 has the problem too. For the moment our only solution was to downgrade to v.5.5.3 and add exceptions for the security vulnerabilities it has.