TonicAI / masquerade

A Postgres Proxy to Mask Data in Realtime

Home Page:https://www.tonic.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

COPY statements

cgrenzel opened this issue · comments

I am trying to run pg_dump but fails with the message bellow:

pg_dump: error: query failed: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. pg_dump: error: query was: COPY public.table_name (<COLUMNS_NAMES>) TO stdout;

pg_dump with insert statements works fine.

Because the restore with inserts takes too long, would be nice to have support for COPY statements.

Trying not to be too hideous myself, Is there interest in supporting this?
Thank you.

@cgrenzel I can look into supporting this but would very much like to know exactly how you are using Masquerade. Can you explain your use case a bit?

Hi @akamor. Thanks for this.

I am using masquerade to generate a dump with some contents changed so can be used in different environments.

I set the environment variables:
PGSSLMODE=disable
PGPASSWORD=***

And start up masquerade which is connecting to a database.

Then I run:
pg_dump -h "127.0.0.1" -p 20000 -U "${DB_USER}" --dbname="${DB_NAME}" | gzip >"${COMPRESSED_FILE_NAME}"

This pg_dump is connecting to the masquerade proxy.

The initial pg_dump process of creating database objects works. It fails when starts dumping the contents of a table with the message:
pg_dump: error: query failed: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. pg_dump: error: query was: COPY public.table_name (<COLUMNS_NAMES>) TO stdout;

Ok, so I haven't worked on Masquerade in quite a while but I think I see the issue. When the bytes pass through the proxy we are listening for only 2 types of messages, specifically the DataRow and RowDescription messages (this can be seen on line 135 of PostgresBackendStateMachine.cs. Also, you can read more about these message types here: https://www.postgresql.org/docs/9.3/protocol-flow.html#AEN99807

I suspect the COPY operation uses a different set of messages. I likely won't have time to look into this for at least a few weeks. With that being said, I could possibly aid you in adding support OR you can checkout our paid product which supports the masking/de-identification of production databases. You can learn more about it at https://tonic.ai

If the paid product is not your cup of tea just let me know and I'll see if I can get to this.

Thanks for taking a look at this and sharing your thoughts.

At the moment we are solving by using dump with multiple rows per insert. The restoration is not as fast as using COPY but is feasible for our current needs.

Will keep an eye on the paid product if our needs became more complex.

Appreciate your help!