jackc / pgx

PostgreSQL driver and toolkit for Go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

storing a string Go value into bytea PostgreSQL column works, but reading it back fails

mitar opened this issue · comments

Is your feature request related to a problem? Please describe.

With recent Go improvements to convert strings from and to []byte without copying I would like storing strings into bytea column works transparently, with as little copying as possible.

I am writing a key-value store on top of PostgreSQL and I would like to simplify storing of built-in strings so that users of my library would not have to convert them to []byte.

Describe the solution you'd like

It looks like storing a string Go value works already, but reading back fails with:

cannot scan bytea (OID 17) in binary format into *string

I do not see much reason why scanning bytea into string would not just work for standard types with pgx? Go allows conversion between string and []byte with simple casting, I would hope that storing into a string into the database would work the same.

Describe alternatives you've considered

My understanding is that I could register my own type codec, but I could not find much information about how to an existing codec: I would like that default bytea handling stays the same, only that strings are additionally scanned without copying.

I suspect there are unresolvable ambiguities regarding string <> bytea.

Writing the string to a PostgreSQL bytea at least appears to work because pgx sends any string to PostgreSQL in the text format without any conversion. pgx does the same thing when reading. Any text formatted value can be scanned directly into a string without any conversion.

But what happens when you read the value? In the text format, PostgreSQL will return the value hex encoded, but in binary you will get back the raw bytes.

For example: if you write the 4 bytes jack to a bytea you will get back the ten bytes \x6a61636b in the text format but the original 4 bytes in binary format.

Even working directly in psql it can be confusing dealing with strings in bytea. I recommend that you instead use text or varchar if possible.

My understanding is that I could register my own type codec, but I could not find much information about how to an existing codec: I would like that default bytea handling stays the same, only that strings are additionally scanned without copying.

If you really need/want to do this, you probably don't have to create a new Codec. I think you would only need to create your own type that implements pgtype.BytesScanner and pgtype.BytesValuer.

Thanks for the response.

If you really need/want to do this, you probably don't have to create a new Codec. I think you would only need to create your own type that implements pgtype.BytesScanner and pgtype.BytesValuer.

But that would require one to scan into this custom type, not string itself.

For string itself, it seems like I would have to unregister ByteaCodec and register a custom one? Am I mistaken? Or is there an easier way? Maybe just using a TryWrapScanPlanFuncs?

I am just thinking that since Go 1.20, there is cheap way to convert between string and []byte:

func String2ByteSlice(str string) []byte {
	return unsafe.Slice(unsafe.StringData(str), len(str))
}

func ByteSlice2String(bs []byte) string {
	if len(bs) == 0 {
		return ""
	}
	return unsafe.String(unsafe.SliceData(bs), len(bs))
}

So maybe pgx could just simply convert between them as necessary.

For string itself, it seems like I would have to unregister ByteaCodec and register a custom one? Am I mistaken? Or is there an easier way? Maybe just using a TryWrapScanPlanFuncs?

Yeah, either of these is a possibility, though you will still have the ambiguity of text vs. binary protocol.

I am just thinking that since Go 1.20, there is cheap way to convert between string and []byte:

I don't really see where these could be safely used. The read buffer is reused for every row. The unsafely constructed string would be corrupted by future rows.

I don't really see where these could be safely used. The read buffer is reused for every row. The unsafely constructed string would be corrupted by future rows.

Oh, you are right, for scanning it does not work.