jackc / pgx

PostgreSQL driver and toolkit for Go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Context cancellation has no effect when pipelines are executing

pascal-za opened this issue · comments

Describe the bug
After moving from pgx v4 to v5.4.3, we noticed that cancelling a context when a SendBatch is in progress no longer has any effect. In v4, and indeed when config.ConnConfig.DefaultQueryExecMode is set to pgx.QueryExecModeSimpleProtocol, then a context with timeout correctly closes the connection.

As a workaround, it might be helpful to be able to at least disable pipelining independently to statement caching. At the moment pipelining is implicitly tied to exec mode. We also considered a statement_timeout, but it's not a great substitute as it applies to a single statement and not the whole batch.

To Reproduce
The following example sets a timeout of 7 seconds, but hangs for the full 20 seconds needed to run the batch. It is tested on Mac OS 13.5.2 but we're seeing the same behaviour on production Linux containers:

package main

import (
	"context"
	"fmt"
	"time"

	"github.com/jackc/pgx/v5/pgxpool"
	"github.com/jackc/pgx/v5"
)

func main() {
	url := "postgres://some-url"

	ctx, cancel := context.WithTimeout(context.Background(), 7*time.Second)
	defer cancel()

	config, err := pgxpool.ParseConfig(url)
	if err != nil {
		log.Fatal(err)
	}

	pool, err := pgxpool.NewWithConfig(ctx, config)
	if err != nil {
		log.Fatal(err)
	}
	defer pool.Close()

	tx, err := pool.Begin(ctx)
	if err != nil {
		log.Fatal(err)
	}

        // ACTUAL ISSUE REPRO HERE
	batch := &pgx.Batch{}
	batch.Queue("select pg_sleep(10)")
	batch.Queue("select pg_sleep(10)")

	res := tx.SendBatch(ctx, batch)

	var n string
	err = res.QueryRow().Scan(&n)
	if err != nil {
		log.Fatal(err)
	}
	err = res.QueryRow().Scan(&n)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Println("ran whole batch")

        // END OF ISSUE REPRO

	res.Close()
	err = tx.Commit(ctx)
	if err != nil {
		log.Fatal(err)
	}
}

Expected behavior
When using QueryExecModeSimpleProtocol the connection is closed correctly:

FATA[0007] timeout: context deadline exceeded           
exit status 1
go run cmd/timeout-test/main.go  0.84s user 0.54s system 16% cpu 8.161 total

Actual behavior
The calling goroutine hangs until the batch is done, regardless of timeout:

ran whole batch
FATA[0020] timeout: context already done: context deadline exceeded 
exit status 1
go run cmd/timeout-test/main.go  0.90s user 0.54s system 6% cpu 21.294 total

Version

  • Go: go version go1.21.0 darwin/arm64
  • PostgreSQL: PostgreSQL 12.16 (Debian 12.16-1.pgdg110+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
  • pgx: v5.4.3

I don't think the pipeline mode should itself cause a problem.

...

Yup, looks like the ctx wasn't being passed all the way down.

Try as of a61517a.

Yup, looks like the ctx wasn't being passed all the way down.

Try as of a61517a.

Confirmed, this fixes the lack of timeouts with SendBatch. Simpler than expected, thank you!

One very minor thing to note is that when the context times out in this scenario the error returned from Scan() is:

FATA[0007] read tcp [::1]:64106->[::1]:5432: i/o timeout 

In contrast to a simple query using Exec():

FATA[0007] timeout: context deadline exceeded  

Therefore, checking something like errors.Is(err, context.DeadlineExceeded) would fail. I don't consider that a huge problem, but just to be aware, in case folks get confused looking for some kind of networking issue.

Good catch. I try to normalize errors when possible. Fixed in 163eb68.

Also, you may want consider using https://pkg.go.dev/github.com/jackc/pgx/v5@v5.4.3/pgconn#Timeout as it also checks for multiple timeout types.

Thanks for fixing this! Do you think we could get a new bugfix version with this? I would love to upgrade to the fix, but the latest, tagged version does not contain it.

Again thank you so much for all the good work here.

@domano v5.5.0 was just released.