Ignore ERROR: prepared statement already exists (SQLSTATE 42P05) error when preparing

Question

Ignore ERROR: prepared statement already exists (SQLSTATE 42P05) error when preparing

jameshartig opened this issue 6 months ago · comments

Describe the bug
It's not clear what happened but we had a bunch of the following errors:
expected statement description, got *pgconn.ResultReader
expected statement description, got *pgconn.PipelineSync
unexpected pipeline result: *pgconn.PipelineSync
unexpected pipeline result: *pgconn.StatementDescription

Then the connection forever was throwing the error ERROR: prepared statement 'stmtcache_2fdb49e13e67d391f9a11b5965c6bdc9427a2b5e65613919' already exists (SQLSTATE 42P05).

To Reproduce
Unfortunately, I have no idea how to reproduce this. There was a bunch of packet loss in GCP at the time this occurred so it could have been something related to that.

We issue this particular SQL like:

b := new(pgx.Batch)
b.Queue("BEGIN READ ONLY")
b.Queue(query, arg)
b.Queue("COMMIT")
pool.SendBatch(ctx, b)

Expected behavior
Despite the pipeline parsing failing, I would expect future prepares to handle 42P051 especially since Prepare says it's idempotent.

Alternatively, if we get an unexpected pipeline type, we should maybe call asyncClose so the connection can be thrown away?

Actual behavior
Instead that query forever failed on the connection with the error ERROR: prepared statement 'stmtcache_2fdb49e13e67d391f9a11b5965c6bdc9427a2b5e65613919' already exists (SQLSTATE 42P05).

Version

Go: $ go version -> 1.21.5
PostgreSQL: $ psql --no-psqlrc --tuples-only -c 'select version()' -> PostgreSQL 11.2-YB-2.18.2.2-b0 on x86_64-pc-linux-gnu, compiled by clang version 15.0.3 (https://github.com/yugabyte/llvm-project.git 0b8d1183745fd3998d8beffeec8cbe99c1b20529), 64-bit
pgx: $ grep 'github.com/jackc/pgx/v[0-9]' go.mod -> v5.5.0

Additional context

James Hartig · Answer 1 · Sat Dec 23 2023 04:54:33 GMT+0800 (China Standard Time)

My best guess is that there was maybe an error here and we failed to close the connection:

diff --git a/pgconn/pgconn.go b/pgconn/pgconn.go
index d5a67bea..49319350 100644
--- a/pgconn/pgconn.go
+++ b/pgconn/pgconn.go
@@ -2117,6 +2117,7 @@ func (p *Pipeline) getResults() (results any, err error) {
                case *pgproto3.ParseComplete:
                        peekedMsg, err := p.conn.peekMessage()
                        if err != nil {
+                               p.conn.asyncClose()
                                return nil, err
                        }
                        if _, ok := peekedMsg.(*pgproto3.ParameterDescription); ok {

If I happen to get a network error right at that peekMessage() then I can reproduce the issue.

Jack Christensen · Answer 2 · Sun Dec 24 2023 02:17:17 GMT+0800 (China Standard Time)

I think you are correct about the peekMessage error. Fixed in cbc5a70 along with normalizing the error.

Not sure if that actually will resolve the original issue though.

The idempotency of Prepare is based on it keeping track of all prepared statements and being a no-op on repeats. It doesn't handle getting out of sync with the server. I suppose we could handle that case, but it seems like it might be masking an underlying issue.

James Hartig · Answer 3 · Thu Dec 28 2023 06:08:09 GMT+0800 (China Standard Time)

The idempotency of Prepare is based on it keeping track of all prepared statements and being a no-op on repeats. It doesn't handle getting out of sync with the server. I suppose we could handle that case, but it seems like it might be masking an underlying issue.

That makes sense, I'm just trying to figure out how to recover from this scenario. The solution might be to just kill the connection using the new OnPgError. I could try to call Deallocate but it isn't simple for me to generate the hash for the SQL since that's an internal detail.

jacobmikesell · Answer 4 · Fri Jan 26 2024 23:46:47 GMT+0800 (China Standard Time)

@jackc We just saw this error on pgx 5.5.3!
ERROR: prepared statement "stmtcache_e3bbdda88e6fb7fb4a24ad6c6d84a6a52aba9fd5c8667677" already exists (SQLSTATE 42P05)

for more information, we have a pgxpool that runs out of statement cache space relatively frequently (few times a day) and we also have very strict context timeouts so we often have conns getting timed out. Not sure if that info helps

Pascal Houliston · Answer 5 · Thu Feb 01 2024 16:23:17 GMT+0800 (China Standard Time)

We also experienced this error intermittently and had to roll back to 5.4, but unfortunately it's quite hard to track down the situation that leads to this. I will add that we also use context timeouts extensively, so perhaps there could be clue in there.

Jack Christensen · Answer 6 · Sun Feb 04 2024 02:56:35 GMT+0800 (China Standard Time)

I think I found the underlying issue. If an error occurred when deallocating invalidated statements from the cache they would be removed from the cache but not actually deallocated on the server. In particular this could happen when a context was canceled before the query was even started.

This problem is fixed in 832b4f9 and included in the just tagged v5.5.3.

jacobmikesell · Answer 7 · Tue Feb 06 2024 00:57:20 GMT+0800 (China Standard Time)

@jackc Looking over that commit (and checking my PG knowledge) is it correct to say that the pipeline only succeeds or fails atomically? Or is it that deallocates are idempotent so potentially resending them is safe?

Jack Christensen · Answer 8 · Tue Feb 06 2024 10:01:13 GMT+0800 (China Standard Time)

is it correct to say that the pipeline only succeeds or fails atomically?

Technically no, but practically yes. Preparing and deallocating statements at the protocol layer is not transactional. If I recall correctly it may even work in a broken transaction. (This might vary between real PostgreSQL and more or less compatible databases like CockroachDB.) But pretty much the only way it could partially succeed is if the network connection was interrupted. But if that happened the connection is dead anyway.

Or is it that deallocates are idempotent so potentially resending them is safe?

Protocol level deallocates are idempotent.

jacobmikesell · Answer 9 · Sat Feb 24 2024 08:19:38 GMT+0800 (China Standard Time)

FYI @jackc we just saw the same error again on 5.5.3 Interestingly this time it was in a pgx.Batch execution that is part of a larger transaction

Jack Christensen · Answer 10 · Sun Feb 25 2024 00:28:38 GMT+0800 (China Standard Time)

@jacobmikesell The problem only occurred in very specific circumstances.

If the size of the statement cache is exceeded then a query is evicted from the cache, but it is only marked as invalid it is not actually deallocated on the server.

Before the next query is sent all invalidated statements are deallocated.

However, deallocating invalidated statements did not occur inside of a transaction. I believe this was a relic of the older deallocate code that actually called a DEALLOCATE SQL statement instead of using the underlying protocol operation.

This wasn't a problem to regular queries as the entry still exists in Conn.preparedStatements so the call to Prepare() is idempotent. However, batch queries do not call Prepare(). They send all prepare statement requests in bulk.

So if a transaction was in progress, and a statement was evicted from the cache, and the batch then tried to send that exact query then the error would occur.

Should be resolved in 046f497. It allows deallocating invalidated statements inside of a transaction that is not in an error state.

James Hartig · Answer 11 · Sun Feb 25 2024 01:17:52 GMT+0800 (China Standard Time)

@jackc sorry for the comment spam but just wanted to give you kudos for the investigations in this issue! Hopefully this is the last instance of this and we can close it out.

jacobmikesell · Answer 12 · Wed Feb 28 2024 00:28:33 GMT+0800 (China Standard Time)

Thanks for looking into and fixing :D is there an estimate for when this will hit a minor version? We bump into it about once or twice a week (no good repro on our end) but I'd love to pull the new minor version and test it out!

Jack Christensen · Answer 13 · Sun Mar 03 2024 04:21:02 GMT+0800 (China Standard Time)

@jacobmikesell I expect a new patch release next week.

Jean Scherf · Answer 14 · Tue Apr 09 2024 15:56:21 GMT+0800 (China Standard Time)

Thank you for working on a fix for that. We needed to roll it back after bumping into this problem as well.
Is there any news on the patch release?

Jack Christensen · Answer 15 · Tue May 14 2024 20:38:19 GMT+0800 (China Standard Time)

@jacobmikesell The fix was included into v5.5.5.