Context cancellation has no effect when pipelines are executing

Question

Context cancellation has no effect when pipelines are executing

pascal-za opened this issue a year ago · comments

Describe the bug
After moving from pgx v4 to v5.4.3, we noticed that cancelling a context when a SendBatch is in progress no longer has any effect. In v4, and indeed when config.ConnConfig.DefaultQueryExecMode is set to pgx.QueryExecModeSimpleProtocol, then a context with timeout correctly closes the connection.

As a workaround, it might be helpful to be able to at least disable pipelining independently to statement caching. At the moment pipelining is implicitly tied to exec mode. We also considered a statement_timeout, but it's not a great substitute as it applies to a single statement and not the whole batch.

To Reproduce
The following example sets a timeout of 7 seconds, but hangs for the full 20 seconds needed to run the batch. It is tested on Mac OS 13.5.2 but we're seeing the same behaviour on production Linux containers:

package main

import (
	"context"
	"fmt"
	"time"

	"github.com/jackc/pgx/v5/pgxpool"
	"github.com/jackc/pgx/v5"
)

func main() {
	url := "postgres://some-url"

	ctx, cancel := context.WithTimeout(context.Background(), 7*time.Second)
	defer cancel()

	config, err := pgxpool.ParseConfig(url)
	if err != nil {
		log.Fatal(err)
	}

	pool, err := pgxpool.NewWithConfig(ctx, config)
	if err != nil {
		log.Fatal(err)
	}
	defer pool.Close()

	tx, err := pool.Begin(ctx)
	if err != nil {
		log.Fatal(err)
	}

        // ACTUAL ISSUE REPRO HERE
	batch := &pgx.Batch{}
	batch.Queue("select pg_sleep(10)")
	batch.Queue("select pg_sleep(10)")

	res := tx.SendBatch(ctx, batch)

	var n string
	err = res.QueryRow().Scan(&n)
	if err != nil {
		log.Fatal(err)
	}
	err = res.QueryRow().Scan(&n)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Println("ran whole batch")

        // END OF ISSUE REPRO

	res.Close()
	err = tx.Commit(ctx)
	if err != nil {
		log.Fatal(err)
	}
}

Expected behavior
When using QueryExecModeSimpleProtocol the connection is closed correctly:

FATA[0007] timeout: context deadline exceeded           
exit status 1
go run cmd/timeout-test/main.go  0.84s user 0.54s system 16% cpu 8.161 total

Actual behavior
The calling goroutine hangs until the batch is done, regardless of timeout:

ran whole batch
FATA[0020] timeout: context already done: context deadline exceeded 
exit status 1
go run cmd/timeout-test/main.go  0.90s user 0.54s system 6% cpu 21.294 total

Version

Go: go version go1.21.0 darwin/arm64
PostgreSQL: PostgreSQL 12.16 (Debian 12.16-1.pgdg110+1) on aarch64-unknown-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
pgx: v5.4.3

Jack Christensen · Answer 1 · Fri Sep 29 2023 09:02:13 GMT+0800 (China Standard Time)

I don't think the pipeline mode should itself cause a problem.

...

Yup, looks like the ctx wasn't being passed all the way down.

Try as of a61517a.

Pascal Houliston · Answer 2 · Fri Sep 29 2023 15:32:29 GMT+0800 (China Standard Time)

Yup, looks like the ctx wasn't being passed all the way down.

Try as of a61517a.

Confirmed, this fixes the lack of timeouts with SendBatch. Simpler than expected, thank you!

One very minor thing to note is that when the context times out in this scenario the error returned from Scan() is:

FATA[0007] read tcp [::1]:64106->[::1]:5432: i/o timeout

In contrast to a simple query using Exec():

FATA[0007] timeout: context deadline exceeded

Therefore, checking something like errors.Is(err, context.DeadlineExceeded) would fail. I don't consider that a huge problem, but just to be aware, in case folks get confused looking for some kind of networking issue.

Jack Christensen · Answer 3 · Sat Sep 30 2023 21:54:18 GMT+0800 (China Standard Time)

Good catch. I try to normalize errors when possible. Fixed in 163eb68.

Also, you may want consider using https://pkg.go.dev/github.com/jackc/pgx/v5@v5.4.3/pgconn#Timeout as it also checks for multiple timeout types.

Dino Omanovic · Answer 4 · Fri Nov 03 2023 17:30:13 GMT+0800 (China Standard Time)

Thanks for fixing this! Do you think we could get a new bugfix version with this? I would love to upgrade to the fix, but the latest, tagged version does not contain it.

Again thank you so much for all the good work here.

Jack Christensen · Answer 5 · Sat Nov 04 2023 23:36:00 GMT+0800 (China Standard Time)

@domano v5.5.0 was just released.