zalando / patroni

A template for PostgreSQL High Availability with Etcd, Consul, ZooKeeper, or Kubernetes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Patroni not making switchover in case of primary server's disk is full

udit47 opened this issue · comments

commented

What happened?

I have 3 node clusters (2 pg servers & 3 etcd sharing). Recently Primary server faced an issue where server ran out of space on disk where wal files are stored & eventually postgresql crashed. But switchover did not happen.
At this time patronictl status showed me "Crashed" as the state of Leader and "Running" as the state of Replica. Eventually I cleared out the space and postgresql recovered from crash and started serving requests again.

How can we reproduce it (as minimally and precisely as possible)?

You could try with same setup as mine and try to generate as much wal that it ends up filling your disk and postgresql crashes due to that.

What did you expect to happen?

Switchover to standby should have happened

Patroni/PostgreSQL/DCS version

  • Patroni version: 2.1.3
  • PostgreSQL version: 14.10
  • DCS (and its version): etcd 3.3.25 api version : 2

Patroni configuration file

restapi:
  listen: <ip>:8008
  connect_address: <ip>:8008

etcd:
  host: <ip>:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: replica
        hot_standby: on
        logging_collector: on
        max_wal_senders: 5
        max_replication_slots: 5
        wal_log_hints: on
        track_functions: all
        max_connections: 1000
        max_worker_processes: '8'
        log_line_prefix: '%t [%p]: user=%u,db=%d,app=%a,client=%h'
        log_destination: 'stderr'
        log_filename: 'postgresql-%Y-%m-%d.log'
        log_rotation_age: 1d
        log_directory: '/var/log/postgresql'

  initdb:
  - encoding: UTF8
  - data-checksums

  pg_hba:
  - host replication replicator 0.0.0.0/0 scram-sha-256
  - host replication replicator 127.0.0.1/32 trust
  - host all all 0.0.0.0/0 scram-sha-256

postgresql:
  listen: <ip>:5432
  connect_address: <ip>:5432
  data_dir: "/dt/pgsql/14"
  bin_dir: "/usr/lib/postgresql/14/bin"
  pgpass: /tmp/pgpass0
  basebackup:
  - waldir: "/clog/pgsql/14"
  parameters:
    unix_socket_directories: '/var/run/postgresql'
    archive_command: pgbackrest --stanza=bkp_stanza archive-push "%p"
    archive_mode: "on"

log:
  level: INFO
  traceback_level: ERROR

patronictl show-config

loop_wait: 10
maximum_lag_on_failover: 1048576
postgresql:
  parameters:
    hot_standby: true
    log_destination: stderr
    log_directory: /var/log/postgresql
    log_filename: postgresql-%Y-%m-%d.log
    log_line_prefix: '%t [%p]: user=%u,db=%d,app=%a,client=%h'
    log_rotation_age: 1d
    logging_collector: true
    max_connections: 1000
    max_replication_slots: 5
    max_wal_senders: 5
    max_worker_processes: '8'
    shared_preload_libraries: pglogical
    track_functions: all
    wal_level: logical
    wal_log_hints: true
  use_pg_rewind: true
  use_slots: true
retry_timeout: 10
ttl: 30

Patroni log files

pg-node-0 (Leader)

2023-12-27 18:33:18,133 ERROR: get_postgresql_status
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 676, in query
    cursor.execute(sql, params)
psycopg2.OperationalError: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 610, in get_postgresql_status
    row = self.query(stmt.format(postgresql.wal_name, postgresql.lsn_name), retry=retry)[0]
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 592, in query
    return self.server.query(sql, *params)
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 681, in query
    raise PostgresConnectionException('connection problems')
patroni.exceptions.PostgresConnectionException: 'connection problems'
2023-12-27 18:33:18,134 ERROR: get_postgresql_status
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 676, in query
    cursor.execute(sql, params)
psycopg2.OperationalError: no connection to the server


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 610, in get_postgresql_status
    row = self.query(stmt.format(postgresql.wal_name, postgresql.lsn_name), retry=retry)[0]
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 592, in query
    return self.server.query(sql, *params)
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 681, in query
    raise PostgresConnectionException('connection problems')
patroni.exceptions.PostgresConnectionException: 'connection problems'
2023-12-27 18:33:18,251 INFO: Lock owner: pg-node-0; I am pg-node-0
2023-12-27 18:33:18,251 INFO: establishing a new patroni connection to the postgres cluster
2023-12-27 18:33:19,660 ERROR: get_postgresql_status
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 675, in query
    with self.patroni.postgresql.connection().cursor() as cursor:
  File "/usr/lib/python3/dist-packages/patroni/postgresql/__init__.py", line 255, in connection
    return self._connection.get()
  File "/usr/lib/python3/dist-packages/patroni/postgresql/connection.py", line 24, in get
    self._connection = psycopg.connect(**self._conn_kwargs)
  File "/usr/lib/python3/dist-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: connection to server at "<primary_ip>", port 5432 failed: FATAL:  the database system is in recovery mode


2023-12-27 18:33:19,772 WARNING: Failed to determine PostgreSQL state from the connection, falling back to cached role
2023-12-27 18:33:19,772 ERROR: Exception when called state_handler.last_operation()
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/ha.py", line 157, in update_lock
    last_lsn = self.state_handler.last_operation()
  File "/usr/lib/python3/dist-packages/patroni/postgresql/__init__.py", line 992, in last_operation
    return self._wal_position(self.is_leader(), self._cluster_info_state_get('wal_position'),
  File "/usr/lib/python3/dist-packages/patroni/postgresql/__init__.py", line 354, in _cluster_info_state_get
    raise PostgresConnectionException(self._cluster_info_state['error'])
patroni.exceptions.PostgresConnectionException: "'Too many retry attempts'"
2023-12-27 18:33:19,777 WARNING: Failed to determine PostgreSQL state from the connection, falling back to cached role
2023-12-27 18:33:19,780 INFO: Error communicating with PostgreSQL. Will try again later
2023-12-27 18:33:28,259 WARNING: Postgresql is not running.
2023-12-27 18:33:28,259 INFO: Lock owner: pg-node-0; I am pg-node-0

2023-12-27 18:33:28,275 INFO: doing crash recovery in a single user mode
2023-12-27 18:33:30,782 ERROR: Crash recovery finished with code=-6
2023-12-27 18:33:30,782 INFO:  stdout=
PostgreSQL stand-alone backend 14.10 (Ubuntu 14.10-0ubuntu0.22.04.1)
backend> 
2023-12-27 18:33:30,782 INFO:  stderr=2023-12-27 18:33:28 IST [2444981]: user=,db=,app=,client=LOG:  database system was interrupted while in recovery at 2023-12-27 18:33:20 IST
2023-12-27 18:33:28 IST [2444981]: user=,db=,app=,client=HINT:  This probably means that some data is corrupted and you will have to use the last backup for recovery.
2023-12-27 18:33:29 IST [2444981]: user=,db=,app=,client=LOG:  database system was not properly shut down; automatic recovery in progress
2023-12-27 18:33:29 IST [2444981]: user=,db=,app=,client=LOG:  redo starts at 588/BA161170
2023-12-27 18:33:30 IST [2444981]: user=,db=,app=,client=LOG:  redo done at 588/D7FFFF28 system usage: CPU: user: 0.87 s, system: 0.13 s, elapsed: 1.01 s
2023-12-27 18:33:30 IST [2444981]: user=,db=,app=,client=PANIC:  could not write to file "pg_wal/xlogtemp.2444981": No space left on device

2023-12-27 18:33:38,251 WARNING: Postgresql is not running.
2023-12-27 18:33:38,251 INFO: Lock owner: pg-node-0; I am pg-node-0

2023-12-27 18:33:38,255 INFO: Lock owner: pg-node-0; I am pg-node-0
2023-12-27 18:33:38,258 INFO: starting as readonly because i had the session lock
2023-12-27 18:33:38,462 INFO: postmaster pid=2444999
2023-12-27 18:33:39,475 INFO: Lock owner: pg-node-0; I am pg-node-0
2023-12-27 18:33:39,475 INFO: establishing a new patroni connection to the postgres cluster
2023-12-27 18:33:39,490 INFO: promoted self to leader because I had the session lock
2023-12-27 18:33:39,492 INFO: cleared rewind state after becoming the leader
2023-12-27 18:33:40,692 ERROR: get_postgresql_status
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 610, in get_postgresql_status
    row = self.query(stmt.format(postgresql.wal_name, postgresql.lsn_name), retry=retry)[0]
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 592, in query
    return self.server.query(sql, *params)
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 680, in query
    raise e
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 676, in query
    cursor.execute(sql, params)
psycopg2.OperationalError: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

2023-12-27 18:33:40,692 ERROR: get_postgresql_status
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 676, in query
    cursor.execute(sql, params)
psycopg2.OperationalError: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

2023-12-27 18:33:59,476 WARNING: Postgresql is not running.
2023-12-27 18:33:59,476 INFO: Lock owner: pg-node-0; I am pg-node-0
2023-12-27 18:33:59,479 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202107181
  Database system identifier: 7277219729343958140
  Database cluster state: in archive recovery
  pg_control last modified: Wed Dec 27 18:33:38 2023
  Latest checkpoint location: 588/D8000058
  Latest checkpoint's REDO location: 588/D8000058
  Latest checkpoint's REDO WAL file: 0000000100000588000000D8
  Latest checkpoint's TimeLineID: 1
  Latest checkpoint's PrevTimeLineID: 1
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:166852302
  Latest checkpoint's NextOID: 621136355
  Latest checkpoint's NextMultiXactId: 17
  Latest checkpoint's NextMultiOffset: 33
  Latest checkpoint's oldestXID: 727
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 0
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Wed Dec 27 18:33:30 2023
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 588/D9000000
  Min recovery ending loc's timeline: 1
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: logical
  wal_log_hints setting: on
  max_connections setting: 1000
  max_worker_processes setting: 8
  max_wal_senders setting: 5
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: 7ee5755629ccc8cba5a74cfde74fecfe6abb72475ab6043d0aa9f594ec494f3e

2023-12-27 18:33:59,480 INFO: Lock owner: pg-node-0; I am pg-node-0
2023-12-27 18:33:59,483 INFO: starting as readonly because i had the session lock
2023-12-27 18:33:59,582 INFO: postmaster pid=2445060
2023-12-27 18:34:00,595 INFO: Lock owner: pg-node-0; I am pg-node-0
2023-12-27 18:34:00,595 INFO: establishing a new patroni connection to the postgres cluster
2023-12-27 18:34:00,607 INFO: promoted self to leader because I had the session lock
2023-12-27 18:34:00,609 INFO: cleared rewind state after becoming the leader
2023-12-27 18:34:01,735 ERROR: get_postgresql_status
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 676, in query
    cursor.execute(sql, params)
psycopg2.OperationalError: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 610, in get_postgresql_status
    row = self.query(stmt.format(postgresql.wal_name, postgresql.lsn_name), retry=retry)[0]
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 592, in query
    return self.server.query(sql, *params)
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 681, in query
    raise PostgresConnectionException('connection problems')
patroni.exceptions.PostgresConnectionException: 'connection problems'
2023-12-27 18:34:01,736 ERROR: get_postgresql_status
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 675, in query
    with self.patroni.postgresql.connection().cursor() as cursor:
  File "/usr/lib/python3/dist-packages/patroni/postgresql/__init__.py", line 255, in connection
    return self._connection.get()
  File "/usr/lib/python3/dist-packages/patroni/postgresql/connection.py", line 24, in get
    self._connection = psycopg.connect(**self._conn_kwargs)
  File "/usr/lib/python3/dist-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: connection to server at "<primary_ip>", port 5432 failed: Connection refused
	Is the server running on that host and accepting TCP/IP connections?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 610, in get_postgresql_status
    row = self.query(stmt.format(postgresql.wal_name, postgresql.lsn_name), retry=retry)[0]
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 592, in query
    return self.server.query(sql, *params)
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 681, in query
    raise PostgresConnectionException('connection problems')
patroni.exceptions.PostgresConnectionException: 'connection problems'
2023-12-27 18:34:10,596 INFO: Lock owner: pg-node-0; I am pg-node-0
2023-12-27 18:34:10,599 INFO: establishing a new patroni connection to the postgres cluster
2023-12-27 18:34:10,700 INFO: establishing a new patroni connection to the postgres cluster
2023-12-27 18:34:10,700 WARNING: Retry got exception: 'connection problems'
2023-12-27 18:34:10,706 INFO: updated leader lock during promote
2023-12-27 18:34:10,746 ERROR: get_postgresql_status
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 675, in query
    with self.patroni.postgresql.connection().cursor() as cursor:
  File "/usr/lib/python3/dist-packages/patroni/postgresql/__init__.py", line 255, in connection
    return self._connection.get()
  File "/usr/lib/python3/dist-packages/patroni/postgresql/connection.py", line 24, in get
    self._connection = psycopg.connect(**self._conn_kwargs)
  File "/usr/lib/python3/dist-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: connection to server at "<primary_ip>", port 5432 failed: Connection refused
	Is the server running on that host and accepting TCP/IP connections?


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 610, in get_postgresql_status
    row = self.query(stmt.format(postgresql.wal_name, postgresql.lsn_name), retry=retry)[0]
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 592, in query
    return self.server.query(sql, *params)
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 681, in query
    raise PostgresConnectionException('connection problems')
patroni.exceptions.PostgresConnectionException: 'connection problems'
2023-12-27 18:34:10,746 ERROR: get_postgresql_status
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 675, in query
    with self.patroni.postgresql.connection().cursor() as cursor:
  File "/usr/lib/python3/dist-packages/patroni/postgresql/__init__.py", line 255, in connection
    return self._connection.get()
  File "/usr/lib/python3/dist-packages/patroni/postgresql/connection.py", line 24, in get
    self._connection = psycopg.connect(**self._conn_kwargs)
  File "/usr/lib/python3/dist-packages/psycopg2/__init__.py", line 122, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
psycopg2.OperationalError: connection to server at "<primary_ip>", port 5432 failed: Connection refused
	Is the server running on that host and accepting TCP/IP connections?


2023-12-27 18:34:20,600 INFO: Lock owner: pg-node-0; I am pg-node-0
2023-12-27 18:34:20,603 INFO: starting as readonly because i had the session lock
2023-12-27 18:34:20,701 INFO: postmaster pid=2445122
2023-12-27 18:34:21,715 INFO: Lock owner: pg-node-0; I am pg-node-0
2023-12-27 18:34:21,715 INFO: establishing a new patroni connection to the postgres cluster
2023-12-27 18:34:21,726 INFO: promoted self to leader because I had the session lock
2023-12-27 18:34:21,726 INFO: cleared rewind state after becoming the leader
2023-12-27 18:34:22,759 ERROR: get_postgresql_status
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 610, in get_postgresql_status
    row = self.query(stmt.format(postgresql.wal_name, postgresql.lsn_name), retry=retry)[0]
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 592, in query
    return self.server.query(sql, *params)
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 680, in query
    raise e
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 676, in query
    cursor.execute(sql, params)
psycopg2.OperationalError: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

2023-12-27 18:34:22,759 ERROR: get_postgresql_status
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/api.py", line 676, in query
    cursor.execute(sql, params)
psycopg2.OperationalError: server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.

pg-node-1 (Replica)

2023-12-27 18:33:18,778 INFO: Selected new etcd server http://172.22.46.174:2379
2023-12-27 18:33:18,795 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:33:19,823 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:33:28,283 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:33:38,275 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:33:39,493 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:33:49,494 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:33:59,502 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:34:00,613 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:34:10,615 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:34:20,620 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:34:21,732 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:34:31,734 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:34:41,739 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:34:42,860 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:34:52,860 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:35:02,865 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:35:03,979 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:35:13,979 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:35:23,985 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:35:25,101 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:35:35,100 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:35:45,107 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:35:46,231 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:35:56,222 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:36:06,228 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:36:07,350 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:36:17,342 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)
2023-12-27 18:36:27,348 INFO: no action. I am (pg-node-1), a secondary, and following a leader (pg-node-0)

PostgreSQL log files

pg-node-0

2023-12-27 18:33:18 IST [2556450]: user=,db=,app=,client=LOG:  all server processes terminated; reinitializing
2023-12-27 18:33:19 IST [2444952]: user=,db=,app=,client=LOG:  database system was interrupted; last known up at 2023-12-27 18:24:32 IST
2023-12-27 18:33:19 IST [2444953]: user=postgres,db=postgres,app=[unknown],client=<primary_ip>FATAL:  the database system is in recovery mode
2023-12-27 18:33:19 IST [2444956]: user=postgres,db=postgres,app=[unknown],client=<primary_ip>FATAL:  the database system is in recovery mode
2023-12-27 18:33:19 IST [2444957]: user=postgres,db=postgres,app=[unknown],client=<primary_ip>FATAL:  the database system is in recovery mode
2023-12-27 18:33:19 IST [2444958]: user=postgres,db=postgres,app=[unknown],client=<primary_ip>FATAL:  the database system is in recovery mode
2023-12-27 18:33:19 IST [2444960]: user=postgres,db=postgres,app=[unknown],client=<primary_ip>FATAL:  the database system is in recovery mode
2023-12-27 18:33:20 IST [2444952]: user=,db=,app=,client=LOG:  database system was not properly shut down; automatic recovery in progress
2023-12-27 18:33:20 IST [2444952]: user=,db=,app=,client=LOG:  redo starts at 588/BA161170
2023-12-27 18:33:21 IST [2444963]: user=postgres,db=postgres,app=[unknown],client=<primary_ip>FATAL:  the database system is in recovery mode
2023-12-27 18:33:21 IST [2444964]: user=postgres,db=postgres,app=[unknown],client=<primary_ip>FATAL:  the database system is in recovery mode
2023-12-27 18:33:21 IST [2444952]: user=,db=,app=,client=LOG:  redo done at 588/D7FFFF28 system usage: CPU: user: 0.85 s, system: 0.16 s, elapsed: 1.03 s
2023-12-27 18:33:21 IST [2444952]: user=,db=,app=,client=PANIC:  could not write to file "pg_wal/xlogtemp.2444952": No space left on device
2023-12-27 18:33:21 IST [2556450]: user=,db=,app=,client=LOG:  startup process (PID 2444952) was terminated by signal 6: Aborted
2023-12-27 18:33:21 IST [2556450]: user=,db=,app=,client=LOG:  aborting startup due to startup process failure
2023-12-27 18:33:21 IST [2556450]: user=,db=,app=,client=LOG:  database system is shut down
2023-12-27 18:33:38 IST [2444999]: user=,db=,app=,client=LOG:  starting PostgreSQL 14.10 (Ubuntu 14.10-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2023-12-27 18:33:38 IST [2444999]: user=,db=,app=,client=LOG:  listening on IPv4 address "<primary_ip>", port 5432
2023-12-27 18:33:38 IST [2444999]: user=,db=,app=,client=LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-12-27 18:33:38 IST [2445002]: user=,db=,app=,client=LOG:  database system shutdown was interrupted; last known up at 2023-12-27 18:33:30 IST
2023-12-27 18:33:38 IST [2445002]: user=,db=,app=,client=WARNING:  specified neither primary_conninfo nor restore_command
2023-12-27 18:33:38 IST [2445002]: user=,db=,app=,client=HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2023-12-27 18:33:38 IST [2445002]: user=,db=,app=,client=LOG:  entering standby mode
2023-12-27 18:33:38 IST [2445002]: user=,db=,app=,client=LOG:  database system was not properly shut down; automatic recovery in progress
2023-12-27 18:33:38 IST [2445002]: user=,db=,app=,client=LOG:  redo starts at 588/D80000D0
2023-12-27 18:33:38 IST [2445002]: user=,db=,app=,client=LOG:  consistent recovery state reached at 588/D9000000
2023-12-27 18:33:38 IST [2444999]: user=,db=,app=,client=LOG:  database system is ready to accept read-only connections
2023-12-27 18:33:39 IST [2445002]: user=,db=,app=,client=LOG:  received promote request
2023-12-27 18:33:39 IST [2445002]: user=,db=,app=,client=LOG:  redo done at 588/D80000D0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.73 s
2023-12-27 18:33:39 IST [2445002]: user=,db=,app=,client=LOG:  selected new timeline ID: 2
2023-12-27 18:33:39 IST [2445002]: user=,db=,app=,client=FATAL:  could not write to file "pg_wal/xlogtemp.2445002": No space left on device
2023-12-27 18:33:39 IST [2444999]: user=,db=,app=,client=LOG:  startup process (PID 2445002) exited with exit code 1
2023-12-27 18:33:39 IST [2444999]: user=,db=,app=,client=LOG:  terminating any other active server processes
2023-12-27 18:33:39 IST [2444999]: user=,db=,app=,client=LOG:  shutting down due to startup process failure
2023-12-27 18:33:39 IST [2444999]: user=,db=,app=,client=LOG:  database system is shut down
2023-12-27 18:33:59 IST [2445060]: user=,db=,app=,client=LOG:  starting PostgreSQL 14.10 (Ubuntu 14.10-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2023-12-27 18:33:59 IST [2445060]: user=,db=,app=,client=LOG:  listening on IPv4 address "<primary_ip>", port 5432
2023-12-27 18:33:59 IST [2445060]: user=,db=,app=,client=LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-12-27 18:33:59 IST [2445063]: user=,db=,app=,client=LOG:  database system was interrupted while in recovery at log time 2023-12-27 18:33:30 IST
2023-12-27 18:33:59 IST [2445063]: user=,db=,app=,client=HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2023-12-27 18:33:59 IST [2445063]: user=,db=,app=,client=WARNING:  specified neither primary_conninfo nor restore_command
2023-12-27 18:33:59 IST [2445063]: user=,db=,app=,client=HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2023-12-27 18:33:59 IST [2445063]: user=,db=,app=,client=LOG:  entering standby mode
2023-12-27 18:33:59 IST [2445063]: user=,db=,app=,client=LOG:  redo starts at 588/D80000D0
2023-12-27 18:33:59 IST [2445063]: user=,db=,app=,client=LOG:  consistent recovery state reached at 588/D9000000
2023-12-27 18:33:59 IST [2445060]: user=,db=,app=,client=LOG:  database system is ready to accept read-only connections
2023-12-27 18:34:00 IST [2445063]: user=,db=,app=,client=LOG:  received promote request
2023-12-27 18:34:00 IST [2445063]: user=,db=,app=,client=LOG:  redo done at 588/D80000D0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.72 s
2023-12-27 18:34:00 IST [2445063]: user=,db=,app=,client=LOG:  selected new timeline ID: 2
2023-12-27 18:34:00 IST [2445063]: user=,db=,app=,client=FATAL:  could not write to file "pg_wal/xlogtemp.2445063": No space left on device
2023-12-27 18:34:00 IST [2445060]: user=,db=,app=,client=LOG:  startup process (PID 2445063) exited with exit code 1
2023-12-27 18:34:00 IST [2445060]: user=,db=,app=,client=LOG:  terminating any other active server processes
2023-12-27 18:34:00 IST [2445060]: user=,db=,app=,client=LOG:  shutting down due to startup process failure
2023-12-27 18:34:00 IST [2445060]: user=,db=,app=,client=LOG:  database system is shut down
2023-12-27 18:34:20 IST [2445122]: user=,db=,app=,client=LOG:  starting PostgreSQL 14.10 (Ubuntu 14.10-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2023-12-27 18:34:20 IST [2445122]: user=,db=,app=,client=LOG:  listening on IPv4 address "<primary_ip>", port 5432
2023-12-27 18:34:20 IST [2445122]: user=,db=,app=,client=LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-12-27 18:34:20 IST [2445125]: user=,db=,app=,client=LOG:  database system was interrupted while in recovery at log time 2023-12-27 18:33:30 IST
2023-12-27 18:34:20 IST [2445125]: user=,db=,app=,client=HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2023-12-27 18:34:20 IST [2445125]: user=,db=,app=,client=WARNING:  specified neither primary_conninfo nor restore_command
2023-12-27 18:34:20 IST [2445125]: user=,db=,app=,client=HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2023-12-27 18:34:20 IST [2445125]: user=,db=,app=,client=LOG:  entering standby mode
2023-12-27 18:34:20 IST [2445125]: user=,db=,app=,client=LOG:  redo starts at 588/D80000D0
2023-12-27 18:34:20 IST [2445125]: user=,db=,app=,client=LOG:  consistent recovery state reached at 588/D9000000
2023-12-27 18:34:21 IST [2445122]: user=,db=,app=,client=LOG:  database system is ready to accept read-only connections
2023-12-27 18:34:21 IST [2445125]: user=,db=,app=,client=LOG:  received promote request
2023-12-27 18:34:21 IST [2445125]: user=,db=,app=,client=LOG:  redo done at 588/D80000D0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.72 s
2023-12-27 18:34:21 IST [2445125]: user=,db=,app=,client=LOG:  selected new timeline ID: 2
2023-12-27 18:34:21 IST [2445125]: user=,db=,app=,client=FATAL:  could not write to file "pg_wal/xlogtemp.2445125": No space left on device
2023-12-27 18:34:21 IST [2445122]: user=,db=,app=,client=LOG:  startup process (PID 2445125) exited with exit code 1
2023-12-27 18:34:21 IST [2445122]: user=,db=,app=,client=LOG:  terminating any other active server processes
2023-12-27 18:34:21 IST [2445122]: user=,db=,app=,client=LOG:  shutting down due to startup process failure
2023-12-27 18:34:21 IST [2445122]: user=,db=,app=,client=LOG:  database system is shut down
2023-12-27 18:34:41 IST [2445182]: user=,db=,app=,client=LOG:  starting PostgreSQL 14.10 (Ubuntu 14.10-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2023-12-27 18:34:41 IST [2445182]: user=,db=,app=,client=LOG:  listening on IPv4 address "<primary_ip>", port 5432
2023-12-27 18:34:41 IST [2445182]: user=,db=,app=,client=LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-12-27 18:34:41 IST [2445185]: user=,db=,app=,client=LOG:  database system was interrupted while in recovery at log time 2023-12-27 18:33:30 IST
2023-12-27 18:34:41 IST [2445185]: user=,db=,app=,client=HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2023-12-27 18:34:42 IST [2445185]: user=,db=,app=,client=WARNING:  specified neither primary_conninfo nor restore_command
2023-12-27 18:34:42 IST [2445185]: user=,db=,app=,client=HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2023-12-27 18:34:42 IST [2445185]: user=,db=,app=,client=LOG:  entering standby mode
2023-12-27 18:34:42 IST [2445185]: user=,db=,app=,client=LOG:  redo starts at 588/D80000D0
2023-12-27 18:34:42 IST [2445185]: user=,db=,app=,client=LOG:  consistent recovery state reached at 588/D9000000
2023-12-27 18:34:42 IST [2445182]: user=,db=,app=,client=LOG:  database system is ready to accept read-only connections
2023-12-27 18:34:42 IST [2445185]: user=,db=,app=,client=LOG:  received promote request
2023-12-27 18:34:42 IST [2445185]: user=,db=,app=,client=LOG:  redo done at 588/D80000D0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.72 s
2023-12-27 18:34:42 IST [2445185]: user=,db=,app=,client=LOG:  selected new timeline ID: 2
2023-12-27 18:34:42 IST [2445185]: user=,db=,app=,client=FATAL:  could not write to file "pg_wal/xlogtemp.2445185": No space left on device
2023-12-27 18:34:42 IST [2445182]: user=,db=,app=,client=LOG:  startup process (PID 2445185) exited with exit code 1
2023-12-27 18:34:42 IST [2445182]: user=,db=,app=,client=LOG:  terminating any other active server processes
2023-12-27 18:34:42 IST [2445182]: user=,db=,app=,client=LOG:  shutting down due to startup process failure
2023-12-27 18:34:42 IST [2445182]: user=,db=,app=,client=LOG:  database system is shut down
2023-12-27 18:35:03 IST [2445247]: user=,db=,app=,client=LOG:  starting PostgreSQL 14.10 (Ubuntu 14.10-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2023-12-27 18:35:03 IST [2445247]: user=,db=,app=,client=LOG:  listening on IPv4 address "<primary_ip>", port 5432
2023-12-27 18:35:03 IST [2445247]: user=,db=,app=,client=LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-12-27 18:35:03 IST [2445250]: user=,db=,app=,client=LOG:  database system was interrupted while in recovery at log time 2023-12-27 18:33:30 IST
2023-12-27 18:35:03 IST [2445250]: user=,db=,app=,client=HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2023-12-27 18:35:03 IST [2445250]: user=,db=,app=,client=WARNING:  specified neither primary_conninfo nor restore_command
2023-12-27 18:35:03 IST [2445250]: user=,db=,app=,client=HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2023-12-27 18:35:03 IST [2445250]: user=,db=,app=,client=LOG:  entering standby mode
2023-12-27 18:35:03 IST [2445250]: user=,db=,app=,client=LOG:  redo starts at 588/D80000D0
2023-12-27 18:35:03 IST [2445250]: user=,db=,app=,client=LOG:  consistent recovery state reached at 588/D9000000
2023-12-27 18:35:03 IST [2445247]: user=,db=,app=,client=LOG:  database system is ready to accept read-only connections
2023-12-27 18:35:03 IST [2445250]: user=,db=,app=,client=LOG:  received promote request
2023-12-27 18:35:03 IST [2445250]: user=,db=,app=,client=LOG:  redo done at 588/D80000D0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.72 s
2023-12-27 18:35:03 IST [2445250]: user=,db=,app=,client=LOG:  selected new timeline ID: 2
2023-12-27 18:35:04 IST [2445250]: user=,db=,app=,client=FATAL:  could not write to file "pg_wal/xlogtemp.2445250": No space left on device
2023-12-27 18:35:04 IST [2445247]: user=,db=,app=,client=LOG:  startup process (PID 2445250) exited with exit code 1
2023-12-27 18:35:04 IST [2445247]: user=,db=,app=,client=LOG:  terminating any other active server processes
2023-12-27 18:35:04 IST [2445247]: user=,db=,app=,client=LOG:  shutting down due to startup process failure
2023-12-27 18:35:04 IST [2445247]: user=,db=,app=,client=LOG:  database system is shut down
2023-12-27 18:35:24 IST [2445311]: user=,db=,app=,client=LOG:  starting PostgreSQL 14.10 (Ubuntu 14.10-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2023-12-27 18:35:24 IST [2445311]: user=,db=,app=,client=LOG:  listening on IPv4 address "<primary_ip>", port 5432
2023-12-27 18:35:24 IST [2445311]: user=,db=,app=,client=LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-12-27 18:35:24 IST [2445314]: user=,db=,app=,client=LOG:  database system was interrupted while in recovery at log time 2023-12-27 18:33:30 IST
2023-12-27 18:35:24 IST [2445314]: user=,db=,app=,client=HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2023-12-27 18:35:24 IST [2445314]: user=,db=,app=,client=WARNING:  specified neither primary_conninfo nor restore_command
2023-12-27 18:35:24 IST [2445314]: user=,db=,app=,client=HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2023-12-27 18:35:24 IST [2445314]: user=,db=,app=,client=LOG:  entering standby mode
2023-12-27 18:35:24 IST [2445314]: user=,db=,app=,client=LOG:  redo starts at 588/D80000D0
2023-12-27 18:35:24 IST [2445314]: user=,db=,app=,client=LOG:  consistent recovery state reached at 588/D9000000
2023-12-27 18:35:24 IST [2445311]: user=,db=,app=,client=LOG:  database system is ready to accept read-only connections
2023-12-27 18:35:25 IST [2445314]: user=,db=,app=,client=LOG:  received promote request
2023-12-27 18:35:25 IST [2445314]: user=,db=,app=,client=LOG:  redo done at 588/D80000D0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.72 s
2023-12-27 18:35:25 IST [2445314]: user=,db=,app=,client=LOG:  selected new timeline ID: 2
2023-12-27 18:35:25 IST [2445314]: user=,db=,app=,client=FATAL:  could not write to file "pg_wal/xlogtemp.2445314": No space left on device
2023-12-27 18:35:25 IST [2445311]: user=,db=,app=,client=LOG:  startup process (PID 2445314) exited with exit code 1
2023-12-27 18:35:25 IST [2445311]: user=,db=,app=,client=LOG:  terminating any other active server processes
2023-12-27 18:35:25 IST [2445311]: user=,db=,app=,client=LOG:  shutting down due to startup process failure
2023-12-27 18:35:25 IST [2445311]: user=,db=,app=,client=LOG:  database system is shut down
2023-12-27 18:35:45 IST [2445371]: user=,db=,app=,client=LOG:  starting PostgreSQL 14.10 (Ubuntu 14.10-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2023-12-27 18:35:45 IST [2445371]: user=,db=,app=,client=LOG:  listening on IPv4 address "<primary_ip>", port 5432
2023-12-27 18:35:45 IST [2445371]: user=,db=,app=,client=LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-12-27 18:35:45 IST [2445374]: user=,db=,app=,client=LOG:  database system was interrupted while in recovery at log time 2023-12-27 18:33:30 IST
2023-12-27 18:35:45 IST [2445374]: user=,db=,app=,client=HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2023-12-27 18:35:45 IST [2445377]: user=postgres,db=postgres,app=[unknown],client=<primary_ip>FATAL:  the database system is starting up
2023-12-27 18:35:45 IST [2445378]: user=postgres,db=postgres,app=[unknown],client=<primary_ip>FATAL:  the database system is starting up
2023-12-27 18:35:45 IST [2445374]: user=,db=,app=,client=WARNING:  specified neither primary_conninfo nor restore_command
2023-12-27 18:35:45 IST [2445374]: user=,db=,app=,client=HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2023-12-27 18:35:45 IST [2445374]: user=,db=,app=,client=LOG:  entering standby mode
2023-12-27 18:35:45 IST [2445374]: user=,db=,app=,client=LOG:  redo starts at 588/D80000D0
2023-12-27 18:35:45 IST [2445374]: user=,db=,app=,client=LOG:  consistent recovery state reached at 588/D9000000
2023-12-27 18:35:45 IST [2445371]: user=,db=,app=,client=LOG:  database system is ready to accept read-only connections
2023-12-27 18:35:46 IST [2445374]: user=,db=,app=,client=LOG:  received promote request
2023-12-27 18:35:46 IST [2445374]: user=,db=,app=,client=LOG:  redo done at 588/D80000D0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.73 s
2023-12-27 18:35:46 IST [2445374]: user=,db=,app=,client=LOG:  selected new timeline ID: 2
2023-12-27 18:35:46 IST [2445374]: user=,db=,app=,client=FATAL:  could not write to file "pg_wal/xlogtemp.2445374": No space left on device
2023-12-27 18:35:46 IST [2445371]: user=,db=,app=,client=LOG:  startup process (PID 2445374) exited with exit code 1
2023-12-27 18:35:46 IST [2445371]: user=,db=,app=,client=LOG:  terminating any other active server processes
2023-12-27 18:35:46 IST [2445371]: user=,db=,app=,client=LOG:  shutting down due to startup process failure
2023-12-27 18:35:46 IST [2445371]: user=,db=,app=,client=LOG:  database system is shut down
2023-12-27 18:36:06 IST [2445433]: user=,db=,app=,client=LOG:  starting PostgreSQL 14.10 (Ubuntu 14.10-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2023-12-27 18:36:06 IST [2445433]: user=,db=,app=,client=LOG:  listening on IPv4 address "<primary_ip>", port 5432
2023-12-27 18:36:06 IST [2445433]: user=,db=,app=,client=LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2023-12-27 18:36:06 IST [2445436]: user=,db=,app=,client=LOG:  database system was interrupted while in recovery at log time 2023-12-27 18:33:30 IST
2023-12-27 18:36:06 IST [2445436]: user=,db=,app=,client=HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2023-12-27 18:36:06 IST [2445439]: user=postgres,db=postgres,app=[unknown],client=<primary_ip>FATAL:  the database system is starting up
2023-12-27 18:36:06 IST [2445440]: user=postgres,db=postgres,app=[unknown],client=<primary_ip>FATAL:  the database system is starting up
2023-12-27 18:36:06 IST [2445436]: user=,db=,app=,client=WARNING:  specified neither primary_conninfo nor restore_command
2023-12-27 18:36:06 IST [2445436]: user=,db=,app=,client=HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2023-12-27 18:36:06 IST [2445436]: user=,db=,app=,client=LOG:  entering standby mode
2023-12-27 18:36:06 IST [2445436]: user=,db=,app=,client=LOG:  redo starts at 588/D80000D0
2023-12-27 18:36:06 IST [2445436]: user=,db=,app=,client=LOG:  consistent recovery state reached at 588/D9000000
2023-12-27 18:36:06 IST [2445433]: user=,db=,app=,client=LOG:  database system is ready to accept read-only connections

pg-node-1

2023-12-27 18:33:17 IST [2736117]: user=,db=,app=,client=FATAL:  could not receive data from WAL stream: server closed the connection unexpectedly
                This probably means the server terminated abnormally
                before or while processing the request.
2023-12-27 18:33:17 IST [2736113]: user=,db=,app=,client=LOG:  unexpected pageaddr 588/5C000000 in log segment 0000000100000588000000D8, offset 0
2023-12-27 18:33:17 IST [502192]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: FATAL:  the database system is in recovery mode
2023-12-27 18:33:22 IST [502202]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:33:27 IST [502215]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:33:32 IST [502234]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:33:37 IST [502247]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:33:42 IST [502258]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:33:47 IST [502267]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:33:52 IST [502274]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:33:57 IST [502281]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:34:02 IST [502292]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:34:07 IST [502299]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:34:12 IST [502306]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:34:17 IST [502315]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:34:22 IST [502323]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:34:27 IST [502330]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:34:32 IST [502341]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:34:37 IST [502348]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:34:42 IST [502356]: user=,db=,app=,client=LOG:  started streaming WAL from primary at 588/D8000000 on timeline 1
2023-12-27 18:34:42 IST [2736113]: user=,db=,app=,client=LOG:  successfully skipped missing contrecord at 588/D7FFFFB0, overwritten at 2023-12-27 18:33:30.167074+05:30
2023-12-27 18:34:42 IST [2736113]: user=,db=,app=,client=CONTEXT:  WAL redo at 588/D8000028 for XLOG/OVERWRITE_CONTRECORD: lsn 588/D7FFFFB0; time 2023-12-27 18:33:30.167074+05:30
2023-12-27 18:34:42 IST [502356]: user=,db=,app=,client=FATAL:  could not receive data from WAL stream: server closed the connection unexpectedly
                This probably means the server terminated abnormally
                before or while processing the request.
2023-12-27 18:34:47 IST [2736113]: user=,db=,app=,client=LOG:  unexpected pageaddr 588/5D000000 in log segment 0000000100000588000000D9, offset 0
2023-12-27 18:34:47 IST [502365]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:34:52 IST [502372]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:34:57 IST [502379]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:35:02 IST [502391]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:35:07 IST [502401]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:35:12 IST [502408]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:35:17 IST [502417]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:35:22 IST [502424]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:35:27 IST [502434]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:35:32 IST [502444]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:35:37 IST [502451]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:35:42 IST [502458]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:35:47 IST [502470]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:35:52 IST [502477]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:35:57 IST [502484]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:36:02 IST [502493]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:36:07 IST [502503]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:36:12 IST [502510]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:36:17 IST [502519]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:36:22 IST [502526]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:36:27 IST [502534]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: FATAL:  the database system is starting up
2023-12-27 18:36:32 IST [502545]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:36:37 IST [502553]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:36:42 IST [502560]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:36:47 IST [502569]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:36:52 IST [502579]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:36:57 IST [502586]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:37:02 IST [502595]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:37:07 IST [502602]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:37:12 IST [502612]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:37:17 IST [502621]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:37:22 IST [502628]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:37:27 IST [502635]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:37:32 IST [502647]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:37:37 IST [502655]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:37:42 IST [502662]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?
2023-12-27 18:37:47 IST [502671]: user=,db=,app=,client=FATAL:  could not connect to the primary server: connection to server at "<primary_ip>", port 5432 failed: Connection refused
                Is the server running on that host and accepting TCP/IP connections?

Have you tried to use GitHub issue search?

  • Yes

Anything else we need to know?

Replica server is running at the half capacity compared to the Leader server in terms of CPU & RAM.

Switchover to standby should have happened

It is not a Patroni task to monitor free disk space.
If Postgres primary crashed (it usually happens when Postgres can't write to disk) Patroni is trying to start it up.
This behavior could be changed by setting primary_start_timeout to 0.

Patroni version: 2.1.3

Please upgrade to the latest version