jackc / pgx

PostgreSQL driver and toolkit for Go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CollectRows with pgtype.Bits returns corrupted data on large result sets

vitprajzler opened this issue · comments

Describe the bug
When collecting a large result from a pool.Query using pgx.CollectRows, if the result has a BIT[] array (pgtype.Bits) type of column, the returned pgtype.Bits values are corrupted. On a table with only SERIAL and BIT[32] column, the corruption happens starting from 431 rows.

To Reproduce

package db_test

import (
	"context"
	"testing"

	"github.com/jackc/pgx/v5"
	"github.com/jackc/pgx/v5/pgtype"
	"github.com/jackc/pgx/v5/pgxpool"
)

const DB_URL = "postgresql://postgres@/bittest"

func TestBitArray(t *testing.T) {
	pool, err := pgxpool.New(context.Background(), DB_URL)
	if err != nil {
		t.Fatal(err)
	}

	_, err = pool.Exec(context.Background(), "DROP TABLE IF EXISTS test_bitarray")
	if err != nil {
		t.Fatal(err)
	}
	_, err = pool.Exec(context.Background(), "CREATE TABLE test_bitarray (id SERIAL PRIMARY KEY, bits BIT(32))")
	if err != nil {
		t.Fatal(err)
	}

	bitArray := "00011000000000000000010000000100"
	count := 431

	for i := 0; i < count; i++ {
		_, err = pool.Exec(context.Background(), "INSERT INTO test_bitarray (bits) VALUES ($1)", bitArray)
		if err != nil {
			t.Fatal(err)
		}
	}

	rows, err := pool.Query(context.Background(), "SELECT bits FROM test_bitarray")
	if err != nil {
		t.Fatal(err)
	}

	rowBits, err := pgx.CollectRows[pgtype.Bits](rows, pgx.RowTo[pgtype.Bits])
	if err != nil {
		t.Fatal(err)
	}

	rows.Close()

	if len(rowBits) != count {
		t.Fatalf("Number of rows %d is not %d", len(rowBits), count)
	}

	for i, bits := range rowBits {
		bitsAsString, err := bits.Value()
		if err != nil {
			t.Fatal(err)
		}

		if bitsAsString != bitArray {
			t.Fatalf("Bit array %d is not as expected, %s != %s", i, bitsAsString, bitArray)
		}
	}
}

Ran the test with race detector, got no race messages.

Expected behavior
The returned rows should match the rows in the database.

Actual behavior
The returned rows contain corrupted data that does not match the data in the database.

Version

  • Go: go version go1.22.0 linux/amd64
  • PostgreSQL: PostgreSQL 16.1 on x86_64-pc-linux-musl, compiled by gcc (Alpine 12.2.1_git20220924-r10) 12.2.1 20220924, 64-bit
  • pgx: v5.5.3

Additional context
It looks like the corruption threshold has to do with the memory that rows need. I have a table that stores more data along with a column of BIT[32], and it takes only ~30 rows to see the corrupted data. Interestingly, columns other than the bit arrays do not get corrupted. If I have two columns of BIT[32], both columns get corrupted.

It looks like doing a rows.Scan instead of CollectRows seems to work even on large results, as long as the pgtype.Bits.Value is called in the same cycle as the rows.Scan. In other words, the problem persists if I Scan all rows first, and then try to do pgtype.Bits.Value.

The problem was pgtype.Bits.Bytes was a slice of the driver read buffer. So future reads would corrupt the values that were already read. It now makes a copy of the data.