Occasional sql: transaction has already been committed or rolled back errors
samsondav opened this issue · comments
We are seeing transaction has already been committed or rolled back errors
if we try to use the database "too quickly" after initializing it, with a cancel context.
Code looks like this:
func init() {
dbURL := os.Getenv("DATABASE_URL")
if dbURL == "" {
log.Fatal("You must provide a DATABASE_URL environment variable")
}
txdb.Register("txdb", "postgres", dbURL)
}
func DBWithDefaultContext(db *gorm.DB, fc func(db *gorm.DB) error) error {
ctx, cancel := context.WithTimeout(context.Background(), 10 * time.Second)
defer cancel()
return fc(db.WithContext(ctx))
}
func Test_example(t *testing.T) {
db, err := sql.Open("txdb", uuid.NewV4().String())
require.NoError(t, err)
t.Cleanup(func() { assert.NoError(t, db.Close()) })
gormDB, err := gorm.Open(postgres.New(postgres.Config{
Conn: sqlDB,
DSN: uuid.NewV4().String(),
}), &gorm.Config{})
require.NoError(t, err)
err = DBWithDefaultContext(gormDB, func(db *gorm.DB) error {
return db.Order("created_at ASC, address ASC").Where("deleted_at IS NULL").Find(&keys).Error
})
err := db.Exec(`SELECT 1`).Error // ERROR: sql: transaction has already been committed or rolled back
}
Note that this error about "transaction already committed or rolled back" has nothing to do with postgres (actually the transaction never gets closed). The postgres logs look like this:
2021-07-15 10:05:52.719 BST [63393] LOG: statement: ;
2021-07-15 10:05:52.720 BST [63393] LOG: statement: BEGIN READ WRITE
2021-07-15 10:05:52.720 BST [63393] LOG: statement: SELECT * FROM "keys" WHERE deleted_at IS NULL AND "keys"."deleted_at" IS NULL ORDER BY created_at ASC, address ASC
// ENDS HERE
It must be a timing issue because it can be resolved by inserting a sleep or gosched here:
db, err := sql.Open("txdb", uuid.NewV4().String())
require.NoError(t, err)
t.Cleanup(func() { assert.NoError(t, db.Close()) })
runtime.Gosched() // FIXES THE TEST
// or
time.Sleep(1 * time.Millisecond) // FIXES THE TEST
// ... rest of the test code
Note that I was running this test with a parallelism flag of 1 - single threaded mode.
UPDATE:
It seems that calling runtime.Gosched()
only sometimes fixes it - not always. Calling db.Exec('SELECT 1')
in place of that seems to be a more reliable fix.
I am also getting this since v0.1.4
.
runtime.Gosched and time.Sleep do not fix it for me
The only fix for me was to revert to v0.1.3
should be fixed by commit now.
the patch before introduced a chance for deadlock, but it also was necessary to prevent shared resource access. hope this will solve both issues. please reopen in case if you see any issue on v0.1.5
@l3pp4rd sorry but I am still getting the errors even when using v0.1.5. Using v0.1.3 shows no errors.
sql: transaction has already been committed or rolled back
I'm also experiencing it on v0.1.5
also stuck with gorm
in our project. Any update on this?
I think I've tracked this down. You can reproduce it by creating a db then issuing an ExecContext
. After than first insert is completed, the context that begins the tx is still being used. That insert is over, and the context cancelation shouldnt' affect the db as a whole, but rather the exec. After the first ExecContext
is completed successfully, cancel the context used for that exec, and create a new insert with new context and it fails with tx has been rolled back. I'm going to see if I can work this up a bit more. When I add this test to the db_test.go
it fails as expected:
failing test
func TestShouldHandleStmtsWithoutContextPollution(t *testing.T) {
t.Parallel()
for _, driver := range drivers() {
t.Run(driver, func(t *testing.T) {
db, err := sql.Open(driver, "contextpollution")
if err != nil {
t.Fatalf(driver+": failed to open a connection, have you run 'make test'? err: %s", err)
}
defer db.Close()
const insertSQL = "INSERT INTO users (username, email) VALUES (?, ?)"
ctx1, cancel1 := context.WithCancel(context.Background())
defer cancel1()
_, err = db.ExecContext(ctx1, insertSQL, "first", "first@foo.com")
if err != nil {
t.Fatalf("unexpected error inserting user 1: %s", err)
}
cancel1()
ctx2, cancel2 := context.WithCancel(context.Background())
defer cancel2()
_, err = db.ExecContext(ctx2, insertSQL, "second", "second@foo.com")
if err != nil {
t.Fatalf("unexpected error inserting user 2: %s", err)
}
cancel2()
rows, err := db.QueryContext(context.Background(), "select username from users")
if err != nil {
t.Fatalf("unexpected error querying users: %s", err)
}
defer rows.Close()
var users []string
for rows.Next() {
var user string
err := rows.Scan(&user)
if err != nil {
t.Errorf("unexpected scan failure: %s", err)
continue
}
users = append(users, user)
}
sort.Strings(users)
wanted := []string{"first", "second"}
if len(users) != 2 {
t.Fatalf("invalid users received; want=%v\tgot=%v", wanted, users)
}
for i, want := range wanted {
if got := users[i]; want != got {
t.Errorf("invalid user; want=%s\tgot=%s", want, got)
}
}
})
}
}
submitted PR to address the context pollution issue. Its passing all tests and runs without a hitch locally as well. Fixes the issues I was seeing.
@veqryn, I'm curious if you'll get errors with the branch I just pushed to fix an issue with context handling. Its fixed both gorm and sqlx tests locally.
now that PR #48 has merged, I think its safe to close this issue
@samsondav, I’m going to close this issue. It should be resolved now. Feel free to return if the issue resurfaces on v0.1.6 or later 👍
We're seeing this in our tests. Our testcases each have their own context that gets cancelled when the testcase finishes.
There is a race condition that the whole transaction may be canceled and the next testcase will use the canceled c.tx
. Specifically, the select statement in that goroutine may observe either <-done
which doesn't do anything or <-ctx.Done()
which cancels rootCtx
, triggering the error on the next operation.
Lines 56 to 58 in 5bd9aad
As a temporary fix, we just removed the whole goroutine to prevent the global transaction being rolled back by a single operation's context.
It must be a timing issue because it can be resolved by inserting a sleep or gosched here:
By inserting a delay before the context gets canceled gives the goroutine in beginTxOnce
time to select the <-done
case but it is not deterministic.