Connection pool sometimes does not end properly

Question

Connection pool sometimes does not end properly

langpavel opened this issue 3 months ago · comments

Hello. I discovered race condition with sql.end().

It seems that if there are more queries queued than max pool size, something hangs.

I'm running on deno 1.43.1 with postgres.js v3.4.4

When CLOSE_POOL_TIMEOUT is around 33, there is 50/50 chance of both conditions below.
If CLOSE_POOL_TIMEOUT is much greater, process will hang.
If CLOSE_POOL_TIMEOUT is lesser, process will exit normally.

import postgres from "https://deno.land/x/postgresjs/mod.js";

const CLOSE_POOL_TIMEOUT = 33;

const sql = postgres({
  debug: true,
  max: 2,
  connect_timeout: 1,
});

const log = (str: string, ...rest: unknown[]) =>
  console.log(
    `[${(performance.now() / 1000).toFixed(3).padStart(6)}s] ${str}`,
    ...rest,
  );

const delay = (ms: number) => new Promise((r) => setTimeout(r, ms));

const longQuery = () =>
  sql`SELECT CURRENT_TIMESTAMP as ts, pg_sleep(0.05) as sleep`.execute();

let ended = false;
const parallelQueries = async () => {
  for (let i = 1; i <= 100; i++) {
    await delay(10);

    if (ended) {
      return;
    }

    const queryId = i;
    log(`${queryId} run`);
    longQuery()
      .then((r) => {
        log(`${queryId} OK:  ts: ${r?.[0]?.ts.toISOString()}`);
      })
      .catch((e) => {
        log(`${queryId} ERR:`, e.message);
      });
  }
};

parallelQueries()
  .then(() => {
    log("parallelQueries done");
  })
  .catch(() => {
    log("parallelQueries error");
  });

setTimeout(async () => {
  log("closing pool...");
  await sql.end({
    timeout: 1,
  });
  ended = true;
  log("pool closed");
}, CLOSE_POOL_TIMEOUT);

globalThis.addEventListener("unload", () => {
  log("goodbye!");
});

The output is as follows:
First run, OK, finished successfully 👍

% deno run -A postgres/client.ts
[ 0.106s] 1 run
[ 0.121s] 2 run
[ 0.130s] closing pool...
[ 0.132s] 3 run
[ 0.132s] 3 ERR: write CONNECTION_ENDED /var/run/postgresql/.s.PGSQL.5444
[ 0.143s] 4 run
[ 0.143s] 4 ERR: write CONNECTION_ENDED /var/run/postgresql/.s.PGSQL.5444
[ 0.156s] 5 run
[ 0.156s] 5 ERR: write CONNECTION_ENDED /var/run/postgresql/.s.PGSQL.5444
[ 0.167s] 6 run
[ 0.167s] 6 ERR: write CONNECTION_ENDED /var/run/postgresql/.s.PGSQL.5444
[ 0.172s] 1 OK:  ts: 2024-05-04T19:50:46.984Z
[ 0.179s] 7 run
[ 0.179s] 7 ERR: write CONNECTION_ENDED /var/run/postgresql/.s.PGSQL.5444
[ 0.181s] 2 OK:  ts: 2024-05-04T19:50:46.994Z
[ 0.181s] pool closed
[ 0.190s] parallelQueries done
[ 0.192s] goodbye!

Second run does not ternimate. 🤕

% deno run -A postgres/client.ts
[ 0.102s] 1 run
[ 0.114s] 2 run
[ 0.126s] 3 run  <-- 3rd client queued when pool max == 2
[ 0.126s] closing pool...
[ 0.138s] 4 run
[ 0.138s] 4 ERR: write CONNECTION_ENDED /var/run/postgresql/.s.PGSQL.5444
[ 0.151s] 5 run
[ 0.151s] 5 ERR: write CONNECTION_ENDED /var/run/postgresql/.s.PGSQL.5444
[ 0.162s] 6 run
[ 0.162s] 6 ERR: write CONNECTION_ENDED /var/run/postgresql/.s.PGSQL.5444
[ 0.165s] 1 OK:  ts: 2024-05-04T19:50:48.303Z
[ 0.173s] 7 run
[ 0.173s] 7 ERR: write CONNECTION_ENDED /var/run/postgresql/.s.PGSQL.5444
[ 0.175s] 2 OK:  ts: 2024-05-04T19:50:48.314Z
[ 0.176s] pool closed
[ 0.183s] parallelQueries done
[ 0.234s] 3 OK:  ts: 2024-05-04T19:50:48.372Z  <-- response after pool closed?
^C

Jack · Answer 1 · Sat May 18 2024 02:39:29 GMT+0800 (China Standard Time)

May not be the same issue, but we're also seeing client.end() not working/allowing nodejs to exit gracefully. When we search through for the open handles, it points to this:

# PBKDF2REQUEST
node:internal/async_hooks:202
node:internal/crypto/pbkdf2:65
/node_modules/.pnpm/postgres@3.4.4/node_modules/postgres/cjs/src/connection.js:691 - const saltedPassword = await crypto.pbkdf2Sync(
node:internal/process/task_queues:95

granthusbands · Answer 2 · Sat Jun 29 2024 21:15:41 GMT+0800 (China Standard Time)

If #869 gets fixed, you might not need to call client.end() or sql.end().

Jack · Answer 3 · Sun Jun 30 2024 05:11:55 GMT+0800 (China Standard Time)

It's an interesting proposal, but that wouldn't fix the underlying issue here as this seems to be a resource/memory leak.

Say I ran a fleet of databases and scripted some administration across the fleet, the node process would consume more and more memory as it processed each database rather than freeing it up when end() is called.

#869 would mask/hide the leak in certain scenarios (process exit), but at the same it could make further resource leaks harder to detect.