zalando / patroni

A template for PostgreSQL High Availability with Etcd, Consul, ZooKeeper, or Kubernetes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RAFT - failed to update leader lock

nrmn2492 opened this issue · comments

What happened?

The issue has arisen several times today and previously about 3-4 months ago, where we observe an unexpected failover occurring without any preceding errors. There's nothing of substance in the logs, and there's no increase in resource demand. We simply receive the following:

2024-04-17 12:05:36 +0200 ERROR: failed to update leader lock
2024-04-17 12:05:36 +0200 INFO: Demoting self (immediate-nolock)
2024-04-17 12:05:39 +0200 INFO: demoted self because failed to update leader lock in DCS

I suspect RAFT is handling this, specifically the lock, and something might be going awry there. However, there's no deeper trace of what might be happening. There's no memory exhaustion, CPU usage is normal; it simply fails to write the LOCK, triggering the failover.

How can we reproduce it (as minimally and precisely as possible)?

Occured randomly, Can not be reproduced

What did you expect to happen?

Patroni/PostgreSQL/DCS version

  • Patroni version: 2.1.5
  • PostgreSQL version: 13.14-0+deb11u1
  • DCS (and its version): Raft & pysyncobj: 0.3.11

Patroni configuration file

scope: postgres-cl
namespace: /db/
name: psql13-patroni-1

restapi:
    listen: 0.0.0.0:8008
    connect_address: psql-patroni-1:8008

log:
  level: INFO                           #  NOTEST|DEBUG|INFO|WARNING|ERROR|CRITICAL
  dir: /var/log/patroni            #  patroni log dir
  file_size: 16777216                   #  16MB log triggers log rotation
  file_num: 30                          #  keep 30 file
  dateformat: '%Y-%m-%d %H:%M:%S %z'    #  IMPORTANT: discard milli timestamp
  format: '%(asctime)s %(levelname)s: %(message)s'

raft:
    self_addr:  psql-patroni-1:2222
    partner_addrs:
    - psql-patroni-2:2222
    - psql-patroni-3:2222
    - psql-patroni-4:2222
    - psql-patroni-5:2222

    data_dir: /opt/patroni/raft/

bootstrap:
    dcs:
        ttl: 30
        loop_wait: 10
        retry_timeout: 10
        maximum_lag_on_failover: 1048576
        postgresql:
            use_pg_rewind: true

    initdb:
    - encoding: UTF8
    - data-checksums

    pg_hba:
    - host replication replicator 127.0.0.1/32   md5
    - host replication replicator 192.168.0.0/16 md5
    - host replication replicator 172.16.0.0/12 md5
    - host replication replicator 10.0.0.0/8 md5
    - host all         all        192.168.0.0/16 md5
    - host all         all        172.16.0.0/12  md5
    - host all         all        10.0.0.0/8     md5
    users:
      admin:
        password: admin
        options:
          - createrole
          - createdb

postgresql:
    listen: 0.0.0.0:5432
    connect_address:  psql-patroni-1:5432
    bin_dir: /usr/lib/postgresql/13/bin/
    data_dir: /data/patroni
    pgpass: /tmp/pgpass
    authentication:
        replication:
            username: replicator
            password: aaaa
        superuser:
            username: postgres
            password: aaaa
        # rewind:
        #     username: rewind
        #     password: password
    parameters:
        unix_socket_directories: '.'
        max_connections: '500'
        work_mem: 128MB
        maintenance_work_mem: 384MB
        log_temp_files: 0
        shared_buffers: 1591MB

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

patronictl show-config

loop_wait: 10
maximum_lag_on_failover: 1048576
postgresql:
  parameters:
    log_temp_files: 0
    maintenance_work_mem: 384MB
    max_connections: '500'
    shared_buffers: 1200MB
    work_mem: 128MB
    zdb.default_elasticsearch_url: http://elastic:asdasd@patroni-elastic-zdb:9200/
  pg_hba:
  - host replication replicator 127.0.0.1/32   md5
  - host replication replicator 192.168.0.0/16 md5
  - host replication replicator 172.16.0.0/12 md5
  - host replication replicator 10.0.0.0/8 md5
  - host all         all        127.0.0.1/32   trust
  - host all         all        192.168.0.0/16 md5
  - host all         all        172.16.0.0/12  md5
  - host all         all        10.0.0.0/8     md5
  use_pg_rewind: true
retry_timeout: 10
ttl: 30

Patroni log files

2024-04-17 12:05:07 +0200 INFO: no action. I am (psql13-patroni-1), the leader with the lock
2024-04-17 12:05:17 +0200 INFO: no action. I am (psql13-patroni-1), the leader with the lock
2024-04-17 12:05:26 +0200 INFO: Lock owner: psql13-patroni-1; I am psql13-patroni-1
2024-04-17 12:05:36 +0200 ERROR: failed to update leader lock
2024-04-17 12:05:36 +0200 INFO: Demoting self (immediate-nolock)
2024-04-17 12:05:39 +0200 INFO: demoted self because failed to update leader lock in DCS
2024-04-17 12:05:39 +0200 WARNING: Loop time exceeded, rescheduling immediately.
2024-04-17 12:05:39 +0200 INFO: closed patroni connection to the postgresql cluster
2024-04-17 12:05:39 +0200 INFO: postmaster pid=228249
2024-04-17 12:05:39 +0200 INFO: Lock owner: psql13-patroni-1; I am psql13-patroni-1
2024-04-17 12:05:49 +0200 INFO: starting after demotion in progress
2024-04-17 12:05:49 +0200 WARNING: Loop time exceeded, rescheduling immediately.
2024-04-17 12:05:49 +0200 INFO: Lock owner: psql13-patroni-2; I am psql13-patroni-1
2024-04-17 12:05:49 +0200 INFO: establishing a new patroni connection to the postgres cluster
2024-04-17 12:05:49 +0200 INFO: Local timeline=34 lsn=48/D2B19568
2024-04-17 12:05:49 +0200 INFO: no action. I am (psql13-patroni-1), a secondary, and following a leader (psql13-patroni-2)
2024-04-17 12:05:49 +0200 INFO: Lock owner: psql13-patroni-2; I am psql13-patroni-1
2024-04-17 12:05:49 +0200 INFO: Local timeline=34 lsn=48/D2B19568
2024-04-17 12:05:49 +0200 INFO: no action. I am (psql13-patroni-1), a secondary, and following a leader (psql13-patroni-2)
2024-04-17 12:05:50 +0200 INFO: Lock owner: psql13-patroni-2; I am psql13-patroni-1
2024-04-17 12:05:50 +0200 INFO: Local timeline=34 lsn=48/D2B19568
2024-04-17 12:05:50 +0200 INFO: no action. I am (psql13-patroni-1), a secondary, and following a leader (psql13-patroni-2)
2024-04-17 12:06:00 +0200 INFO: Lock owner: psql13-patroni-2; I am psql13-patroni-1
2024-04-17 12:06:00 +0200 INFO: Local timeline=35 lsn=48/D3CE9E58
2024-04-17 12:06:00 +0200 INFO: master_timeline=35
2024-04-17 12:06:00 +0200 INFO: no action. I am (psql13-patroni-1), a secondary, and following a leader (psql13-patroni-2)
2024-04-17 12:06:10 +0200 INFO: no action. I am (psql13-patroni-1), a secondary, and following a leader (psql13-patroni-2)

PostgreSQL log files

2024-04-17 12:05:36.980 CEST [3470862] LOG:  received immediate shutdown request
2024-04-17 12:05:36.991 CEST [228152] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:36.991 CEST [228152] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:36.991 CEST [228152] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:36.991 CEST [227713] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:36.991 CEST [227713] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:36.991 CEST [227713] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:36.991 CEST [228140] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:36.991 CEST [228140] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:36.991 CEST [228140] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:36.991 CEST [228094] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:36.991 CEST [228094] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:36.991 CEST [228094] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:36.991 CEST [227734] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:36.991 CEST [227734] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:36.991 CEST [227734] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:36.991 CEST [223742] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:36.991 CEST [223742] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:36.991 CEST [223742] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:36.991 CEST [227200] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:36.991 CEST [227200] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:36.991 CEST [227200] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:36.991 CEST [227789] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:36.991 CEST [227789] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:36.991 CEST [227789] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:36.996 CEST [216148] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:36.996 CEST [216148] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:36.996 CEST [216148] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:36.996 CEST [227788] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:36.996 CEST [227788] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:36.996 CEST [227788] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:36.996 CEST [227787] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:36.996 CEST [227787] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:36.996 CEST [227787] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:36.996 CEST [223912] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:36.996 CEST [223912] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:36.996 CEST [223912] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:36.997 CEST [227784] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:36.997 CEST [227784] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:36.997 CEST [227784] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:36.998 CEST [223905] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:36.998 CEST [223905] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:36.998 CEST [223905] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.000 CEST [220198] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.000 CEST [220198] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.000 CEST [220198] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.000 CEST [221777] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.000 CEST [221777] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.000 CEST [221777] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.000 CEST [215808] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.000 CEST [215808] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.000 CEST [215808] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.001 CEST [227753] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.001 CEST [227753] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.001 CEST [227753] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.004 CEST [220205] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.004 CEST [220205] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.004 CEST [220205] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.004 CEST [217044] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.004 CEST [217044] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.004 CEST [217044] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.004 CEST [131269] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.004 CEST [131269] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.004 CEST [131269] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.004 CEST [3533518] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.004 CEST [3533518] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.004 CEST [3533518] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.006 CEST [216120] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.006 CEST [216120] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.006 CEST [216120] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.008 CEST [217586] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.008 CEST [217586] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.008 CEST [217586] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.008 CEST [204489] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.008 CEST [204489] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.008 CEST [204489] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.008 CEST [178418] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.008 CEST [178418] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.008 CEST [178418] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.008 CEST [4171323] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.008 CEST [4171323] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.008 CEST [4171323] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.012 CEST [3470885] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.012 CEST [3470885] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.012 CEST [3470885] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.012 CEST [216598] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.012 CEST [216598] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.012 CEST [216598] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.012 CEST [224335] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.012 CEST [224335] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.012 CEST [224335] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.012 CEST [1607256] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.012 CEST [1607256] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.012 CEST [1607256] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.012 CEST [3534290] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.012 CEST [3534290] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.012 CEST [3534290] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.014 CEST [3533883] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.014 CEST [3533883] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.014 CEST [3533883] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.016 CEST [3533496] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.016 CEST [3533496] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.016 CEST [3533496] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.016 CEST [187274] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.016 CEST [187274] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.016 CEST [187274] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.031 CEST [221772] WARNING:  terminating connection because of crash of another server process
2024-04-17 12:05:37.031 CEST [221772] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2024-04-17 12:05:37.031 CEST [221772] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:05:37.048 CEST [228240] FATAL:  the database system is shutting down
2024-04-17 12:05:37.051 CEST [228241] FATAL:  the database system is shutting down
2024-04-17 12:05:37.249 CEST [3470862] LOG:  database system is shut down
localhost:5432 - no response
2024-04-17 12:05:39.686 CEST [228249] LOG:  starting PostgreSQL 13.14 (Debian 13.14-0+deb11u1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2024-04-17 12:05:39.686 CEST [228249] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-04-17 12:05:39.701 CEST [228249] LOG:  listening on Unix socket "./.s.PGSQL.5432"
2024-04-17 12:05:39.739 CEST [228251] LOG:  database system was interrupted; last known up at 2024-04-17 12:05:22 CEST
2024-04-17 12:05:39.792 CEST [228252] FATAL:  the database system is starting up
2024-04-17 12:05:39.797 CEST [228253] FATAL:  the database system is starting up
2024-04-17 12:05:39.797 CEST [228253] LOG:  could not send data to client: Connection reset by peer
2024-04-17 12:05:39.799 CEST [228254] FATAL:  the database system is starting up
2024-04-17 12:05:39.799 CEST [228254] LOG:  could not send data to client: Connection reset by peer
2024-04-17 12:05:40.073 CEST [228251] WARNING:  specified neither primary_conninfo nor restore_command
2024-04-17 12:05:40.073 CEST [228251] HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2024-04-17 12:05:40.073 CEST [228251] LOG:  entering standby mode
2024-04-17 12:05:40.110 CEST [228251] LOG:  database system was not properly shut down; automatic recovery in progress
2024-04-17 12:05:40.129 CEST [228251] LOG:  redo starts at 48/D2B19480
2024-04-17 12:05:40.129 CEST [228251] LOG:  invalid record length at 48/D2B19568: wanted 24, got 0
2024-04-17 12:05:40.137 CEST [228251] LOG:  consistent recovery state reached at 48/D2B19568
2024-04-17 12:05:40.138 CEST [228249] LOG:  database system is ready to accept read only connections
localhost:5432 - accepting connections
localhost:5432 - accepting connections
2024-04-17 12:05:41.079 CEST [228276] LOG:  PID 218062 in cancel request did not match any process
2024-04-17 12:05:41.083 CEST [228277] LOG:  could not send data to client: Connection reset by peer
2024-04-17 12:05:41.083 CEST [228277] FATAL:  connection to client lost
2024-04-17 12:05:45.737 CEST [228260] LOG:  could not send data to client: Broken pipe
2024-04-17 12:05:45.737 CEST [228260] FATAL:  connection to client lost
server signaled
2024-04-17 12:05:49.357 CEST [228249] LOG:  received SIGHUP, reloading configuration files
2024-04-17 12:05:49.360 CEST [228249] LOG:  parameter "primary_conninfo" changed to "user=replicator passfile=/tmp/pgpass host=psql-patroni-2 port=5432 sslmode=prefer application_name=psql13-patroni-1 gssencmode=prefer channel_binding=prefer"
2024-04-17 12:05:49.360 CEST [228249] LOG:  parameter "primary_slot_name" changed to "psql13_patroni_1"
2024-04-17 12:05:49.390 CEST [228322] FATAL:  could not start WAL streaming: ERROR:  replication slot "psql13_patroni_1" does not exist
2024-04-17 12:05:54.137 CEST [228338] LOG:  could not send data to client: Connection reset by peer
2024-04-17 12:05:54.137 CEST [228338] FATAL:  connection to client lost
2024-04-17 12:05:54.380 CEST [228339] LOG:  fetching timeline history file for timeline 35 from primary server
2024-04-17 12:05:54.399 CEST [228339] LOG:  started streaming WAL from primary at 48/D2000000 on timeline 34
2024-04-17 12:05:54.439 CEST [228339] LOG:  replication terminated by primary server
2024-04-17 12:05:54.439 CEST [228339] DETAIL:  End of WAL reached on timeline 34 at 48/D2B19568.
2024-04-17 12:05:54.440 CEST [228251] LOG:  new target timeline is 35
2024-04-17 12:05:54.441 CEST [228339] LOG:  restarted WAL streaming at 48/D2000000 on timeline 35
/usr/local/lib/python3.9/dist-packages/pysyncobj/serializer.py:88: FutureWarning: GzipFile was opened for writing, but this will change in future Python releases.  Specify the mode argument for opening it for writing.
  with gzip.GzipFile(fileobj=f) as g:
/usr/local/lib/python3.9/dist-packages/pysyncobj/serializer.py:88: FutureWarning: GzipFile was opened for writing, but this will change in future Python releases.  Specify the mode argument for opening it for writing.
  with gzip.GzipFile(fileobj=f) as g:
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2024-04-17 12:15:20.418 CEST [228339] FATAL:  could not receive data from WAL stream: server closed the connection unexpectedly
		This probably means the server terminated abnormally
		before or while processing the request.
2024-04-17 12:15:20.419 CEST [228251] LOG:  invalid resource manager ID 171 at 48/D3DA5548
2024-04-17 12:15:20.448 CEST [230428] FATAL:  could not connect to the primary server: FATAL:  the database system is shutting down
2024-04-17 12:15:25.464 CEST [230437] LOG:  started streaming WAL from primary at 48/D3000000 on timeline 35
server signaled
2024-04-17 12:15:57.918 CEST [228249] LOG:  received SIGHUP, reloading configuration files
2024-04-17 12:15:57.921 CEST [228249] LOG:  parameter "primary_conninfo" changed to "user=replicator passfile=/tmp/pgpass host=psql-patroni-4 port=5432 sslmode=prefer application_name=psql13-patroni-1 gssencmode=prefer channel_binding=prefer"
2024-04-17 12:15:57.923 CEST [228251] LOG:  WAL receiver process shutdown requested
2024-04-17 12:15:57.923 CEST [230437] FATAL:  terminating walreceiver process due to administrator command
2024-04-17 12:15:57.944 CEST [230563] LOG:  fetching timeline history file for timeline 36 from primary server
2024-04-17 12:15:57.964 CEST [230563] FATAL:  could not start WAL streaming: ERROR:  replication slot "psql13_patroni_1" does not exist
2024-04-17 12:15:57.965 CEST [228251] LOG:  new target timeline is 36
2024-04-17 12:15:57.987 CEST [230565] FATAL:  could not start WAL streaming: ERROR:  replication slot "psql13_patroni_1" does not exist
2024-04-17 12:15:58.009 CEST [230566] FATAL:  could not start WAL streaming: ERROR:  replication slot "psql13_patroni_1" does not exist
2024-04-17 12:16:03.011 CEST [230599] FATAL:  could not start WAL streaming: ERROR:  replication slot "psql13_patroni_1" does not exist
2024-04-17 12:16:08.011 CEST [230633] LOG:  started streaming WAL from primary at 48/D3000000 on timeline 36

Have you tried to use GitHub issue search?

  • Yes

Anything else we need to know?

No response

Occured randomly, Can not be reproduced

That's the main reason we declared "Raft" support as deprecated: https://patroni.readthedocs.io/en/latest/releases.html#version-3-0-0

Please consider using Etcd.