jackc / pgx

PostgreSQL driver and toolkit for Go

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ignore ERROR: prepared statement already exists (SQLSTATE 42P05) error when preparing

jameshartig opened this issue · comments

Describe the bug
It's not clear what happened but we had a bunch of the following errors:
expected statement description, got *pgconn.ResultReader
expected statement description, got *pgconn.PipelineSync
unexpected pipeline result: *pgconn.PipelineSync
unexpected pipeline result: *pgconn.StatementDescription

Then the connection forever was throwing the error ERROR: prepared statement 'stmtcache_2fdb49e13e67d391f9a11b5965c6bdc9427a2b5e65613919' already exists (SQLSTATE 42P05).

To Reproduce
Unfortunately, I have no idea how to reproduce this. There was a bunch of packet loss in GCP at the time this occurred so it could have been something related to that.

We issue this particular SQL like:

b := new(pgx.Batch)
b.Queue("BEGIN READ ONLY")
b.Queue(query, arg)
b.Queue("COMMIT")
pool.SendBatch(ctx, b)

Expected behavior
Despite the pipeline parsing failing, I would expect future prepares to handle 42P051 especially since Prepare says it's idempotent.

Alternatively, if we get an unexpected pipeline type, we should maybe call asyncClose so the connection can be thrown away?

Actual behavior
Instead that query forever failed on the connection with the error ERROR: prepared statement 'stmtcache_2fdb49e13e67d391f9a11b5965c6bdc9427a2b5e65613919' already exists (SQLSTATE 42P05).

Version

  • Go: $ go version -> 1.21.5
  • PostgreSQL: $ psql --no-psqlrc --tuples-only -c 'select version()' -> PostgreSQL 11.2-YB-2.18.2.2-b0 on x86_64-pc-linux-gnu, compiled by clang version 15.0.3 (https://github.com/yugabyte/llvm-project.git 0b8d1183745fd3998d8beffeec8cbe99c1b20529), 64-bit
  • pgx: $ grep 'github.com/jackc/pgx/v[0-9]' go.mod -> v5.5.0

Additional context

My best guess is that there was maybe an error here and we failed to close the connection:

diff --git a/pgconn/pgconn.go b/pgconn/pgconn.go
index d5a67bea..49319350 100644
--- a/pgconn/pgconn.go
+++ b/pgconn/pgconn.go
@@ -2117,6 +2117,7 @@ func (p *Pipeline) getResults() (results any, err error) {
                case *pgproto3.ParseComplete:
                        peekedMsg, err := p.conn.peekMessage()
                        if err != nil {
+                               p.conn.asyncClose()
                                return nil, err
                        }
                        if _, ok := peekedMsg.(*pgproto3.ParameterDescription); ok {

If I happen to get a network error right at that peekMessage() then I can reproduce the issue.

I think you are correct about the peekMessage error. Fixed in cbc5a70 along with normalizing the error.

Not sure if that actually will resolve the original issue though.

The idempotency of Prepare is based on it keeping track of all prepared statements and being a no-op on repeats. It doesn't handle getting out of sync with the server. I suppose we could handle that case, but it seems like it might be masking an underlying issue.

The idempotency of Prepare is based on it keeping track of all prepared statements and being a no-op on repeats. It doesn't handle getting out of sync with the server. I suppose we could handle that case, but it seems like it might be masking an underlying issue.

That makes sense, I'm just trying to figure out how to recover from this scenario. The solution might be to just kill the connection using the new OnPgError. I could try to call Deallocate but it isn't simple for me to generate the hash for the SQL since that's an internal detail.

@jackc We just saw this error on pgx 5.5.3!
ERROR: prepared statement "stmtcache_e3bbdda88e6fb7fb4a24ad6c6d84a6a52aba9fd5c8667677" already exists (SQLSTATE 42P05)

for more information, we have a pgxpool that runs out of statement cache space relatively frequently (few times a day) and we also have very strict context timeouts so we often have conns getting timed out. Not sure if that info helps

We also experienced this error intermittently and had to roll back to 5.4, but unfortunately it's quite hard to track down the situation that leads to this. I will add that we also use context timeouts extensively, so perhaps there could be clue in there.

I think I found the underlying issue. If an error occurred when deallocating invalidated statements from the cache they would be removed from the cache but not actually deallocated on the server. In particular this could happen when a context was canceled before the query was even started.

This problem is fixed in 832b4f9 and included in the just tagged v5.5.3.

@jackc Looking over that commit (and checking my PG knowledge) is it correct to say that the pipeline only succeeds or fails atomically? Or is it that deallocates are idempotent so potentially resending them is safe?

is it correct to say that the pipeline only succeeds or fails atomically?

Technically no, but practically yes. Preparing and deallocating statements at the protocol layer is not transactional. If I recall correctly it may even work in a broken transaction. (This might vary between real PostgreSQL and more or less compatible databases like CockroachDB.) But pretty much the only way it could partially succeed is if the network connection was interrupted. But if that happened the connection is dead anyway.

Or is it that deallocates are idempotent so potentially resending them is safe?

Protocol level deallocates are idempotent.

FYI @jackc we just saw the same error again on 5.5.3 Interestingly this time it was in a pgx.Batch execution that is part of a larger transaction

@jacobmikesell The problem only occurred in very specific circumstances.

If the size of the statement cache is exceeded then a query is evicted from the cache, but it is only marked as invalid it is not actually deallocated on the server.

Before the next query is sent all invalidated statements are deallocated.

However, deallocating invalidated statements did not occur inside of a transaction. I believe this was a relic of the older deallocate code that actually called a DEALLOCATE SQL statement instead of using the underlying protocol operation.

This wasn't a problem to regular queries as the entry still exists in Conn.preparedStatements so the call to Prepare() is idempotent. However, batch queries do not call Prepare(). They send all prepare statement requests in bulk.

So if a transaction was in progress, and a statement was evicted from the cache, and the batch then tried to send that exact query then the error would occur.

Should be resolved in 046f497. It allows deallocating invalidated statements inside of a transaction that is not in an error state.

@jackc sorry for the comment spam but just wanted to give you kudos for the investigations in this issue! Hopefully this is the last instance of this and we can close it out.

Thanks for looking into and fixing :D is there an estimate for when this will hit a minor version? We bump into it about once or twice a week (no good repro on our end) but I'd love to pull the new minor version and test it out!

@jacobmikesell I expect a new patch release next week.

Thank you for working on a fix for that. We needed to roll it back after bumping into this problem as well.
Is there any news on the patch release?

@jacobmikesell The fix was included into v5.5.5.