zalando / patroni

A template for PostgreSQL High Availability with Etcd, Consul, ZooKeeper, or Kubernetes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Patroni synchronous replication not working

skizilbash opened this issue · comments

What happened?

I have a 3 node cluster and i am trying to set synchronous replication between 1 and 2. Node 3 is in a different DC and can remain async. So i made the below changes as per the doc but instead of sync state it's converting node 2 into the Leader and primary becomes Replica. I first made the below change through edit-config and it didn't work then i made the change directly to patroni.yml file in node 1 and still didn't work. I've tried with and without single/double quotes but not working.

Any ideas!!

==========================================================================
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
synchronous_mode: "on"
slots:
percona_cluster_1:
type: physical

postgresql:
  use_pg_rewind: true
  use_slots: true
  parameters:
    wal_level: replica
    hot_standby: on
    synchronous_commit: "on"
    synchronous_standby_names: "'patroni-pgcluster-qa02'"
    wal_keep_segments: 10
    max_wal_senders: 5
    max_replication_slots: 10
    wal_log_hints: on
    logging_collector: 'on

postgres=# show synchronous_standby_names;
synchronous_standby_names

(1 row)

root@patroni-pgcluster-qa01:/etc/patroni# patronictl -c /etc/patroni/patroni.yml list cluster_qa

  • Cluster: cluster_qa (7354016880499832511) ------+-----------+----+-----------+------------------+
    | Member | Host | Role | State | TL | Lag in MB | Tags |
    +------------------------+--------------+---------+-----------+----+-----------+------------------+
    | patroni-pgcluster-qa01 | 10.10.11.140 | Replica | streaming | 24 | 0 | |
    | patroni-pgcluster-qa02 | 10.10.11.141 | Leader | running | 24 | | |
    | patroni-pgcluster-qa03 | 10.10.11.142 | Replica | streaming | 24 | 0 | nofailover: true |
    +------------------------+--------------+---------+-----------+----+-----------+------------------+

How can we reproduce it (as minimally and precisely as possible)?

n/a

What did you expect to happen?

| Sync Standby | running | 1 | 0 |

Patroni/PostgreSQL/DCS version

  • Patroni version: 3.2.2
  • PostgreSQL version: 15
  • DCS (and its version):

Patroni configuration file

root@patroni-pgcluster-qa01:/etc/patroni# cat patroni.yml

#namespace: hoppa
scope: cluster_qa
name: patroni-pgcluster-qa01

# log:
#   level: DEBUG
#   traceback_level: DEBUG
#   #dir: /var/log/postgresql
#   loggers:
#     patroni.postmaster: DEBUG
#     urllib3: DEBUG


restapi:
    listen: 0.0.0.0:8008
    connect_address: 10.10.11.140:8008

etcd3:
    hosts: 10.10.11.140:2379,10.10.11.141:2379,10.10.11.142:2379

bootstrap:
  # this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    synchronous_mode: "on"
    slots:
      percona_cluster_1:
      type: physical

    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        wal_level: replica
        hot_standby: on
        synchronous_commit: "on"
        synchronous_standby_names: "'patroni-pgcluster-qa02'"
        wal_keep_segments: 10
        max_wal_senders: 5
        max_replication_slots: 10
        wal_log_hints: on
        logging_collector: 'on'

  # some desired options for 'initdb'
  initdb: # Note: It needs to be a list (some options need values, others are switches)
    - encoding: UTF8
    - data-checksums

#  pg_hba: # Add following lines to pg_hba.conf after running 'initdb'
    #   - host replication replicator 127.0.0.1/32 trust
 #   - local   all             all                                     peer
 #   - host    all             all             127.0.0.1/32            scram-sha-256
 #   - host    all             all             10.10.11.140/32         scram-sha-256
 #   - host    all             all             10.10.11.141/32         scram-sha-256
 #   - host    all             all             10.10.11.142/32         scram-sha-256
 #   - host    all             all             10.10.12.253/32         scram-sha-256
 #   - local   replication     all                                     peer
 #   - host    replication     all             127.0.0.1/32            scram-sha-256
 #   - host    replication     replicator     10.10.11.140/32          scram-sha-256
 #   - host    replication     replicator     10.10.11.141/32          scram-sha-256
 #   - host    replication     replicator     10.10.11.143/32          scram-sha-256

  # Some additional users which needs to be created after initializing new cluster
  users:
    admin:
      password: qaz123
      options:
        - createrole
        - createdb
    percona:
      password: qaz123
      options:
        - createrole
        - createdb 

postgresql:
  cluster_name: cluster_qa
  listen: 0.0.0.0:5432
  connect_address: 10.10.11.140:5432
  data_dir: /var/lib/postgresql/15/main
  bin_dir: /usr/lib/postgresql/15/bin
 # config_dir: /etc/postgresql/15/main
  pgpass: /tmp/pgpass
  authentication:
    replication:
      username: replicator
      password: replPasswd
    superuser:
      username: postgres
      password: qaz123
  parameters:
    unix_socket_directories: /var/run/postgresql/
  create_replica_methods:
    - basebackup
  basebackup:
    checkpoint: 'fast'

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false

patronictl show-config

root@patroni-pgcluster-qa01:/etc/patroni# patronictl -c /etc/patroni/patroni.yml show-config
dcs:
  loop_wait: 10
  maximum_lag_on_failover: 1048576
  retry_timeout: 10
  slots:
    percona_cluster_1: null
    ttl: 30
    type: physical
loop_wait: 10
maximum_lag_on_failover: 1048576
pg_hba:
- local   all             all                                     peer
- host    all             all             127.0.0.1/32            scram-sha-256
- host    all             all             10.10.11.140/32         scram-sha-256
- host    all             all             10.10.11.141/32         scram-sha-256
- host    all             all             10.10.11.142/32         scram-sha-256
- host    all             all             10.10.11.11/32          scram-sha-256
- local   replication     all                                     peer
- host    replication     all             127.0.0.1/32            scram-sha-256
- host    replication     replicator     10.10.11.140/32          scram-sha-256
- host    replication     replicator     10.10.11.141/32          scram-sha-256
- host    replication     replicator     10.10.11.142/32          scram-sha-256
postgresql:
  parameters:
    hot_standby: true
    logging_collector: 'on'
    max_replication_slots: 10
    max_wal_senders: 5
    wal_keep_segments: 10
    wal_level: replica
    wal_log_hints: true
  use_pg_rewind: true
  use_slots: true
retry_timeout: 10
slots:
  percona_cluster_1:
    type: physical
ttl: 30

Patroni log files

01), a secondary, and following a leader (patroni-pgcluster-qa02)
Apr 09 17:18:22 patroni-pgcluster-qa01 patroni[1037]: 2024-04-09 17:18:22,366 INFO: no action. I am (patroni-pgcluster-qa01), a secondary, and following a leader (patroni-pgcluster-qa02)
Apr 09 17:18:32 patroni-pgcluster-qa01 patroni[1037]: 2024-04-09 17:18:32,410 INFO: no action. I am (patroni-pgcluster-qa01), a secondary, and following a leader (patroni-pgcluster-qa02)
Apr 09 17:18:42 patroni-pgcluster-qa01 patroni[1037]: 2024-04-09 17:18:42,365 INFO: no action. I am (patroni-pgcluster-qa01), a secondary, and following a leader (patroni-pgcluster-qa02)
Apr 09 17:18:52 patroni-pgcluster-qa01 patroni[1037]: 2024-04-09 17:18:52,410 INFO: no action. I am (patroni-pgcluster-qa01), a secondary, and following a leader (patroni-pgcluster-qa02)
Apr 09 17:19:02 patroni-pgcluster-qa01 patroni[1037]: 2024-04-09 17:19:02,365 INFO: no action. I am (patroni-pgcluster-qa01), a secondary, and following a leader (patroni-pgcluster-qa02)
Apr 09 17:19:12 patroni-pgcluster-qa01 patroni[1037]: 2024-04-09 17:19:12,410 INFO: no action. I am (patroni-pgcluster-qa01), a secondary, and following a leader (patroni-pgcluster-qa02)
Apr 09 17:19:22 patroni-pgcluster-qa01 patroni[1037]: 2024-04-09 17:19:22,365 INFO: no action. I am (patroni-pgcluster-qa01), a secondary, and following a leader (patroni-pgcluster-qa02)
Apr 09 17:19:32 patroni-pgcluster-qa01 patroni[1037]: 2024-04-09 17:19:32,410 INFO: no action. I am (patroni-pgcluster-qa01), a secondary, and following a leader (patroni-pgcluster-qa02)
Apr 09 17:19:42 patroni-pgcluster-qa01 patroni[1037]: 2024-04-09 17:19:42,365 INFO: no action. I am (patroni-pgcluster-qa01), a secondary, and following a leader (patroni-pgcluster-qa02)

PostgreSQL log files

d by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2024-04-03 15:04:30.646 CEST [5474] LOG:  listening on IPv6 address "::1", port 5432
2024-04-03 15:04:30.646 CEST [5474] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2024-04-03 15:04:30.647 CEST [5474] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-04-03 15:04:30.650 CEST [5477] LOG:  database system was shut down at 2024-04-03 15:04:29 CEST
2024-04-03 15:04:30.658 CEST [5474] LOG:  database system is ready to accept connections
2024-04-03 15:09:30.668 CEST [5475] LOG:  checkpoint starting: time
2024-04-03 15:09:34.685 CEST [5475] LOG:  checkpoint complete: wrote 43 buffers (0.3%); 0 WAL file(s) added, 0 removed, 0 recycled; write=4.014 s, sync=0.001 s, total=4.018 s; sync files=11, longest=0.001 s, average=0.001 s; distance=252 kB, estimate=252 kB
2024-04-03 15:11:26.731 CEST [5474] LOG:  received fast shutdown request
2024-04-03 15:11:26.731 CEST [5474] LOG:  aborting any active transactions
2024-04-03 15:11:26.734 CEST [5474] LOG:  background worker "logical replication launcher" (PID 5480) exited with exit code 1
2024-04-03 15:11:26.736 CEST [5475] LOG:  shutting down
2024-04-03 15:11:26.736 CEST [5475] LOG:  checkpoint starting: shutdown immediate
2024-04-03 15:11:26.739 CEST [5475] LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.003 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=227 kB

Have you tried to use GitHub issue search?

  • Yes

Anything else we need to know?

n/a

  1. For synchronous mode you are not supposed to manage synchronous_standby_names on your own, but let Patroni do that by setting synchronous_mode: true in the global config (patronictl edit-config). See the documentation for details: https://patroni.readthedocs.io/en/latest/replication_modes.html#synchronous-mode
  2. To prevent standby from becoming synchronous you can you nosync: true tag: https://patroni.readthedocs.io/en/latest/yaml_configuration.html#tags
  3. Your global config is malformed, there should be no dcs section.
  4. For the future - please use Slack for questions: https://patroni.readthedocs.io/en/latest/contributing_guidelines.html#chatting

Thanks! i will keep a note of answering questions on Slack :)