jeroenrinzema / psql-wire

PostgreSQL server wire protocol. Build your own server and start serving connections.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

handle `copy from stdin` query

hectorj opened this issue Β· comments

Hi, and first of all: thank you for your work! πŸ™‚

I am trying to use pgx's CopyFrom method on a server created with psql-wire.

Reading the spec here https://www.postgresql.org/docs/15/protocol-flow.html#PROTOCOL-COPY, the server is supposed to send a CopyInResponse message before receiving the data via multiple CopyData messages, ending with a CopyDone message.

I don't see a way to handle that currently inside a ParseFn, but maybe I missed something?

The query reaches my parser, and the wire.PreparedStatementFn is called just after, but then we can't properly process the whole operation.

Calling writer.Complete("CopyInResponse") makes the pgx client return without error, but with 0 rows inserted (and we never see the data to be copied server-side).

I'm not sure how this should be handled.

Maybe a different handler? That would make psql-wire responsible for parsing and recognizing a copy from query, I'm not sure that's great.

Or change the wire.DataWriter to be able to send the CopyInResponse and read the next messages? That may be better.

This may be a "won't fix", but I wanted to at least have a discussion about it.

Even if this hurdle falls, I know pgx forces the binary format on copy from so I won't be out of the woods yet πŸ˜„

πŸ‘‹ Hi @hectorj, thanks for opening an issue!

I indeed have not yet included support for the COPY protocol flow inside psql-wire. It is a feature I am interested in implementing. I would like to try to keep the server implementation as simple as possible, with the possibility of defining more complex flows if needed. It appears that the COPY protocol flow is not backwards compatible and would require its own handler methods. The COPY handler could accept a reader and writer interface, allowing the server to read or write rows from/to the client.

Support for text/binary values has been implemented inside the encoder. It is possible to easily define the encoding type while reading incoming messages.

It would be great if you could help me set up a small example project using the COPY protocol. I would like to inspect the packages being sent back and forth between the client and server using the psql-proxy CLI.

I tried setting up an example here: https://github.com/hectorj/psql-proxy-copy-example

But I keep hitting this error message when connecting to the proxy: ERROR unexpected error while reading a typed client message err="message size 1396920893, bigger than maximum allowed message size 16777216"

I don't have this issue with psql-wire or when connecting directly to postgres, so I'm not sure what I missed.

Ah, this might be a bug. This error is often thrown when an unexpected message is read from the client. I will take a look at this later today.

The psql proxy is unaware of the message flows. It might be that untyped messages are send during the COPY protocol flow.

I think it was due to SASL authentication.

Setting POSTGRES_HOST_AUTH_METHOD: md5 in the postgres container removes the error. Not much logged though, I'll keep digging.

Edit: completely wrong

Edit : removed because of many false assumptions

OK, I just had to set the tls flag even though I was using sslmode=disable.

Now here are the full proxy logs:

2023/10/14 16:48:33 INFO proxy listening address=:2345
2023/10/14 16:48:39 INFO incoming connection, dialing PostgreSQL server!
2023/10/14 16:48:39 INFO starting sniffing the PSQL packages
2023/10/14 16:48:39 INFO --> version=196608
2023/10/14 16:48:39 INFO <<- type=R msg="\x00\x00\x00\nSCRAM-SHA-256\x00\x00"
2023/10/14 16:48:39 INFO ->> type=p msg="SCRAM-SHA-256\x00\x00\x00\x00 n,,n=,r=CHzF7wyhaIC7QKYeU9KVkVi5"
2023/10/14 16:48:39 INFO <<- type=R msg="\x00\x00\x00\vr=CHzF7wyhaIC7QKYeU9KVkVi5wMPqcMqo0zKQD4t8h3ENdh0n,s=CV3OW7JkDKmPL7sfcEBvXw==,i=4096"
2023/10/14 16:48:39 INFO ->> type=p msg="c=biws,r=CHzF7wyhaIC7QKYeU9KVkVi5wMPqcMqo0zKQD4t8h3ENdh0n,p=eI+tM3zXgwD2TdcGz+a5BWsIyN9TO2yV0sRaCriuFP0="
2023/10/14 16:48:39 INFO <<- type=R msg="\x00\x00\x00\fv=yI9bfwCkkeyc+1u/PmNoMymUj9bLD5yVbgvdU11AVsY="
2023/10/14 16:48:39 INFO <<- type=R msg="\x00\x00\x00\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="application_name\x00\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="client_encoding\x00UTF8\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="DateStyle\x00ISO, MDY\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="default_transaction_read_only\x00off\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="in_hot_standby\x00off\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="integer_datetimes\x00on\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="IntervalStyle\x00postgres\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="is_superuser\x00on\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="server_encoding\x00UTF8\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="server_version\x0015.2 (Debian 15.2-1.pgdg110+1)\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="session_authorization\x00user\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="standard_conforming_strings\x00on\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="TimeZone\x00Etc/UTC\x00"
2023/10/14 16:48:39 INFO <<- type=K msg="\x00\x00\x00K/\xb5J1"
2023/10/14 16:48:39 INFO <<- type=Z msg=I
2023/10/14 16:48:39 INFO ->> type=Q msg="DROP TABLE IF EXISTS t CASCADE;\x00"
2023/10/14 16:48:39 INFO <<- type=C msg="DROP TABLE\x00"
2023/10/14 16:48:39 INFO <<- type=Z msg=I
2023/10/14 16:48:39 INFO ->> type=Q msg="CREATE TABLE public.t (id serial primary key, field1 text null, field2 int not null, field3 jsonb not null)\x00"
2023/10/14 16:48:39 INFO <<- type=C msg="CREATE TABLE\x00"
2023/10/14 16:48:39 INFO <<- type=Z msg=I
2023/10/14 16:48:39 INFO ->> type=P msg="stmtcache_1\x00select \"field1\", \"field2\", \"field3\" from \"public\".\"t\"\x00\x00\x00"
2023/10/14 16:48:39 INFO ->> type=D msg="Sstmtcache_1\x00"
2023/10/14 16:48:39 INFO ->> type=S msg=""
2023/10/14 16:48:39 INFO <<- type=1 msg=""
2023/10/14 16:48:39 INFO <<- type=t msg="\x00\x00"
2023/10/14 16:48:39 INFO <<- type=T msg="\x00\x03field1\x00\x00\x00@\x18\x00\x02\x00\x00\x00\x19\xff\xff\xff\xff\xff\xff\x00\x00field2\x00\x00\x00@\x18\x00\x03\x00\x00\x00\x17\x00\x04\xff\xff\xff\xff\x00\x00field3\x00\x00\x00@\x18\x00\x04\x00\x00\x0e\xda\xff\xff\xff\xff\xff\xff\x00\x00"
2023/10/14 16:48:39 INFO <<- type=Z msg=I
2023/10/14 16:48:39 INFO ->> type=Q msg="copy \"public\".\"t\" ( \"field1\", \"field2\", \"field3\" ) from stdin binary;\x00"
2023/10/14 16:48:39 INFO <<- type=G msg="\x01\x00\x03\x00\x01\x00\x01\x00\x01"
2023/10/14 16:48:39 INFO ->> type=d msg="PGCOPY\n\xff\r\n\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\ttest text\x00\x00\x00\x04\x00\x00\x00*\x00\x00\x00\x11\x01{\"some\": \"json\"}\x00\x03\xff\xff\xff\xff\x00\x00\x00\x04\x00\x00\x00\a\x00\x00\x00\x05\x01null"
2023/10/14 16:48:39 INFO ->> type=c msg=""
2023/10/14 16:48:39 INFO <<- type=C msg="COPY 2\x00"
2023/10/14 16:48:39 INFO <<- type=Z msg=I

My example repo is now functional if you want to try it.

Hi @hectorj thanks for the update! I apologize, but I'm currently dealing with some personal matters this week that require my attention. I will take a look at this issue as soon as possible. It seems the COPY flow simply has to be implemented for the messages to be handled correctly.

Hey, no worry: you don't have to apologize for not putting benevolent work first ^^

I'm actually testing another lower-level approach (closer to what psql-proxy does) for my project, so there is really no hurry at all.

@hectorj Did you find a solution that works for you? I'm needing something similar.