Patroni synchronous replication not working
skizilbash opened this issue · comments
What happened?
I have a 3 node cluster and i am trying to set synchronous replication between 1 and 2. Node 3 is in a different DC and can remain async. So i made the below changes as per the doc but instead of sync state it's converting node 2 into the Leader and primary becomes Replica. I first made the below change through edit-config and it didn't work then i made the change directly to patroni.yml file in node 1 and still didn't work. I've tried with and without single/double quotes but not working.
Any ideas!!
==========================================================================
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
synchronous_mode: "on"
slots:
percona_cluster_1:
type: physical
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
wal_level: replica
hot_standby: on
synchronous_commit: "on"
synchronous_standby_names: "'patroni-pgcluster-qa02'"
wal_keep_segments: 10
max_wal_senders: 5
max_replication_slots: 10
wal_log_hints: on
logging_collector: 'on
postgres=# show synchronous_standby_names;
synchronous_standby_names
(1 row)
root@patroni-pgcluster-qa01:/etc/patroni# patronictl -c /etc/patroni/patroni.yml list cluster_qa
- Cluster: cluster_qa (7354016880499832511) ------+-----------+----+-----------+------------------+
| Member | Host | Role | State | TL | Lag in MB | Tags |
+------------------------+--------------+---------+-----------+----+-----------+------------------+
| patroni-pgcluster-qa01 | 10.10.11.140 | Replica | streaming | 24 | 0 | |
| patroni-pgcluster-qa02 | 10.10.11.141 | Leader | running | 24 | | |
| patroni-pgcluster-qa03 | 10.10.11.142 | Replica | streaming | 24 | 0 | nofailover: true |
+------------------------+--------------+---------+-----------+----+-----------+------------------+
How can we reproduce it (as minimally and precisely as possible)?
n/a
What did you expect to happen?
| Sync Standby | running | 1 | 0 |
Patroni/PostgreSQL/DCS version
- Patroni version: 3.2.2
- PostgreSQL version: 15
- DCS (and its version):
Patroni configuration file
root@patroni-pgcluster-qa01:/etc/patroni# cat patroni.yml
#namespace: hoppa
scope: cluster_qa
name: patroni-pgcluster-qa01
# log:
# level: DEBUG
# traceback_level: DEBUG
# #dir: /var/log/postgresql
# loggers:
# patroni.postmaster: DEBUG
# urllib3: DEBUG
restapi:
listen: 0.0.0.0:8008
connect_address: 10.10.11.140:8008
etcd3:
hosts: 10.10.11.140:2379,10.10.11.141:2379,10.10.11.142:2379
bootstrap:
# this section will be written into Etcd:/<namespace>/<scope>/config after initializing new cluster
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
synchronous_mode: "on"
slots:
percona_cluster_1:
type: physical
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
wal_level: replica
hot_standby: on
synchronous_commit: "on"
synchronous_standby_names: "'patroni-pgcluster-qa02'"
wal_keep_segments: 10
max_wal_senders: 5
max_replication_slots: 10
wal_log_hints: on
logging_collector: 'on'
# some desired options for 'initdb'
initdb: # Note: It needs to be a list (some options need values, others are switches)
- encoding: UTF8
- data-checksums
# pg_hba: # Add following lines to pg_hba.conf after running 'initdb'
# - host replication replicator 127.0.0.1/32 trust
# - local all all peer
# - host all all 127.0.0.1/32 scram-sha-256
# - host all all 10.10.11.140/32 scram-sha-256
# - host all all 10.10.11.141/32 scram-sha-256
# - host all all 10.10.11.142/32 scram-sha-256
# - host all all 10.10.12.253/32 scram-sha-256
# - local replication all peer
# - host replication all 127.0.0.1/32 scram-sha-256
# - host replication replicator 10.10.11.140/32 scram-sha-256
# - host replication replicator 10.10.11.141/32 scram-sha-256
# - host replication replicator 10.10.11.143/32 scram-sha-256
# Some additional users which needs to be created after initializing new cluster
users:
admin:
password: qaz123
options:
- createrole
- createdb
percona:
password: qaz123
options:
- createrole
- createdb
postgresql:
cluster_name: cluster_qa
listen: 0.0.0.0:5432
connect_address: 10.10.11.140:5432
data_dir: /var/lib/postgresql/15/main
bin_dir: /usr/lib/postgresql/15/bin
# config_dir: /etc/postgresql/15/main
pgpass: /tmp/pgpass
authentication:
replication:
username: replicator
password: replPasswd
superuser:
username: postgres
password: qaz123
parameters:
unix_socket_directories: /var/run/postgresql/
create_replica_methods:
- basebackup
basebackup:
checkpoint: 'fast'
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
patronictl show-config
root@patroni-pgcluster-qa01:/etc/patroni# patronictl -c /etc/patroni/patroni.yml show-config
dcs:
loop_wait: 10
maximum_lag_on_failover: 1048576
retry_timeout: 10
slots:
percona_cluster_1: null
ttl: 30
type: physical
loop_wait: 10
maximum_lag_on_failover: 1048576
pg_hba:
- local all all peer
- host all all 127.0.0.1/32 scram-sha-256
- host all all 10.10.11.140/32 scram-sha-256
- host all all 10.10.11.141/32 scram-sha-256
- host all all 10.10.11.142/32 scram-sha-256
- host all all 10.10.11.11/32 scram-sha-256
- local replication all peer
- host replication all 127.0.0.1/32 scram-sha-256
- host replication replicator 10.10.11.140/32 scram-sha-256
- host replication replicator 10.10.11.141/32 scram-sha-256
- host replication replicator 10.10.11.142/32 scram-sha-256
postgresql:
parameters:
hot_standby: true
logging_collector: 'on'
max_replication_slots: 10
max_wal_senders: 5
wal_keep_segments: 10
wal_level: replica
wal_log_hints: true
use_pg_rewind: true
use_slots: true
retry_timeout: 10
slots:
percona_cluster_1:
type: physical
ttl: 30
Patroni log files
01), a secondary, and following a leader (patroni-pgcluster-qa02)
Apr 09 17:18:22 patroni-pgcluster-qa01 patroni[1037]: 2024-04-09 17:18:22,366 INFO: no action. I am (patroni-pgcluster-qa01), a secondary, and following a leader (patroni-pgcluster-qa02)
Apr 09 17:18:32 patroni-pgcluster-qa01 patroni[1037]: 2024-04-09 17:18:32,410 INFO: no action. I am (patroni-pgcluster-qa01), a secondary, and following a leader (patroni-pgcluster-qa02)
Apr 09 17:18:42 patroni-pgcluster-qa01 patroni[1037]: 2024-04-09 17:18:42,365 INFO: no action. I am (patroni-pgcluster-qa01), a secondary, and following a leader (patroni-pgcluster-qa02)
Apr 09 17:18:52 patroni-pgcluster-qa01 patroni[1037]: 2024-04-09 17:18:52,410 INFO: no action. I am (patroni-pgcluster-qa01), a secondary, and following a leader (patroni-pgcluster-qa02)
Apr 09 17:19:02 patroni-pgcluster-qa01 patroni[1037]: 2024-04-09 17:19:02,365 INFO: no action. I am (patroni-pgcluster-qa01), a secondary, and following a leader (patroni-pgcluster-qa02)
Apr 09 17:19:12 patroni-pgcluster-qa01 patroni[1037]: 2024-04-09 17:19:12,410 INFO: no action. I am (patroni-pgcluster-qa01), a secondary, and following a leader (patroni-pgcluster-qa02)
Apr 09 17:19:22 patroni-pgcluster-qa01 patroni[1037]: 2024-04-09 17:19:22,365 INFO: no action. I am (patroni-pgcluster-qa01), a secondary, and following a leader (patroni-pgcluster-qa02)
Apr 09 17:19:32 patroni-pgcluster-qa01 patroni[1037]: 2024-04-09 17:19:32,410 INFO: no action. I am (patroni-pgcluster-qa01), a secondary, and following a leader (patroni-pgcluster-qa02)
Apr 09 17:19:42 patroni-pgcluster-qa01 patroni[1037]: 2024-04-09 17:19:42,365 INFO: no action. I am (patroni-pgcluster-qa01), a secondary, and following a leader (patroni-pgcluster-qa02)
PostgreSQL log files
d by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2024-04-03 15:04:30.646 CEST [5474] LOG: listening on IPv6 address "::1", port 5432
2024-04-03 15:04:30.646 CEST [5474] LOG: listening on IPv4 address "127.0.0.1", port 5432
2024-04-03 15:04:30.647 CEST [5474] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-04-03 15:04:30.650 CEST [5477] LOG: database system was shut down at 2024-04-03 15:04:29 CEST
2024-04-03 15:04:30.658 CEST [5474] LOG: database system is ready to accept connections
2024-04-03 15:09:30.668 CEST [5475] LOG: checkpoint starting: time
2024-04-03 15:09:34.685 CEST [5475] LOG: checkpoint complete: wrote 43 buffers (0.3%); 0 WAL file(s) added, 0 removed, 0 recycled; write=4.014 s, sync=0.001 s, total=4.018 s; sync files=11, longest=0.001 s, average=0.001 s; distance=252 kB, estimate=252 kB
2024-04-03 15:11:26.731 CEST [5474] LOG: received fast shutdown request
2024-04-03 15:11:26.731 CEST [5474] LOG: aborting any active transactions
2024-04-03 15:11:26.734 CEST [5474] LOG: background worker "logical replication launcher" (PID 5480) exited with exit code 1
2024-04-03 15:11:26.736 CEST [5475] LOG: shutting down
2024-04-03 15:11:26.736 CEST [5475] LOG: checkpoint starting: shutdown immediate
2024-04-03 15:11:26.739 CEST [5475] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.003 s; sync files=0, longest=0.000 s, average=0.000 s; distance=0 kB, estimate=227 kB
Have you tried to use GitHub issue search?
- Yes
Anything else we need to know?
n/a
- For synchronous mode you are not supposed to manage synchronous_standby_names on your own, but let Patroni do that by setting
synchronous_mode: true
in the global config (patronictl edit-config). See the documentation for details: https://patroni.readthedocs.io/en/latest/replication_modes.html#synchronous-mode - To prevent standby from becoming synchronous you can you
nosync: true
tag
: https://patroni.readthedocs.io/en/latest/yaml_configuration.html#tags - Your global config is malformed, there should be no
dcs
section. - For the future - please use Slack for questions: https://patroni.readthedocs.io/en/latest/contributing_guidelines.html#chatting
Thanks! i will keep a note of answering questions on Slack :)