CollectRows with pgtype.Bits returns corrupted data on large result sets
vitprajzler opened this issue · comments
Describe the bug
When collecting a large result from a pool.Query
using pgx.CollectRows
, if the result has a BIT[] array (pgtype.Bits
) type of column, the returned pgtype.Bits
values are corrupted. On a table with only SERIAL and BIT[32] column, the corruption happens starting from 431 rows.
To Reproduce
package db_test
import (
"context"
"testing"
"github.com/jackc/pgx/v5"
"github.com/jackc/pgx/v5/pgtype"
"github.com/jackc/pgx/v5/pgxpool"
)
const DB_URL = "postgresql://postgres@/bittest"
func TestBitArray(t *testing.T) {
pool, err := pgxpool.New(context.Background(), DB_URL)
if err != nil {
t.Fatal(err)
}
_, err = pool.Exec(context.Background(), "DROP TABLE IF EXISTS test_bitarray")
if err != nil {
t.Fatal(err)
}
_, err = pool.Exec(context.Background(), "CREATE TABLE test_bitarray (id SERIAL PRIMARY KEY, bits BIT(32))")
if err != nil {
t.Fatal(err)
}
bitArray := "00011000000000000000010000000100"
count := 431
for i := 0; i < count; i++ {
_, err = pool.Exec(context.Background(), "INSERT INTO test_bitarray (bits) VALUES ($1)", bitArray)
if err != nil {
t.Fatal(err)
}
}
rows, err := pool.Query(context.Background(), "SELECT bits FROM test_bitarray")
if err != nil {
t.Fatal(err)
}
rowBits, err := pgx.CollectRows[pgtype.Bits](rows, pgx.RowTo[pgtype.Bits])
if err != nil {
t.Fatal(err)
}
rows.Close()
if len(rowBits) != count {
t.Fatalf("Number of rows %d is not %d", len(rowBits), count)
}
for i, bits := range rowBits {
bitsAsString, err := bits.Value()
if err != nil {
t.Fatal(err)
}
if bitsAsString != bitArray {
t.Fatalf("Bit array %d is not as expected, %s != %s", i, bitsAsString, bitArray)
}
}
}
Ran the test with race detector, got no race messages.
Expected behavior
The returned rows should match the rows in the database.
Actual behavior
The returned rows contain corrupted data that does not match the data in the database.
Version
- Go:
go version go1.22.0 linux/amd64
- PostgreSQL:
PostgreSQL 16.1 on x86_64-pc-linux-musl, compiled by gcc (Alpine 12.2.1_git20220924-r10) 12.2.1 20220924, 64-bit
- pgx:
v5.5.3
Additional context
It looks like the corruption threshold has to do with the memory that rows need. I have a table that stores more data along with a column of BIT[32], and it takes only ~30 rows to see the corrupted data. Interestingly, columns other than the bit arrays do not get corrupted. If I have two columns of BIT[32], both columns get corrupted.
It looks like doing a rows.Scan
instead of CollectRows
seems to work even on large results, as long as the pgtype.Bits.Value
is called in the same cycle as the rows.Scan
. In other words, the problem persists if I Scan
all rows first, and then try to do pgtype.Bits.Value
.
The problem was pgtype.Bits.Bytes
was a slice of the driver read buffer. So future reads would corrupt the values that were already read. It now makes a copy of the data.