List order of sync_standby is inconsistent with synchronous_standby_names
XiuhuaRuan opened this issue · comments
What happened?
We want to bind standby VIP to the sync standby with highest sync_priority according to the sync_standby list in DCS. But sync_standby list order is inconsistent with synchronous_standby_names. It is sorted by application name not by sync_ priority. Besides, synchronous_standby_names is not stable enough when changing synchronous_node_count.
To make sync_standby and synchronous_standby_names order more stable, I suggest to consider sync_priority as the third sort condition following sync_state and lsn when collecting ReplicaList from pg_stat_replication. Please help evaluate this proposal. Thanks.
How can we reproduce it (as minimally and precisely as possible)?
- Set up a cluster with 3 nodes
- Set synchronous_mode to true via patronictl edit-config, ee_03 was selected as sync standby
- Set synchronous_node_count to 2 via patronictl edit-config, ee_03 and ee_02 were selected as sync standby
- check sync standby list
postgresql.conf:
synchronous_standby_names = '2 (ee_03,ee_02)'
/sync key in etcd:
{"leader":"ee_01","sync_standby":"ee_02,ee_03"} - set synchronous_node_count to 1, sometimes ee_02 was selected as sync standby, sometimes ee_03 was selected as sync standby
What did you expect to happen?
- sync_standby list order is consistent with synchronous_standby_names
- synchronous_standby_names is changed consistently when changing synchronous_node_count. For example, when decrease synchronous_node_count, sync standby with higher sync_priority is expected to remain in the list.
Patroni/PostgreSQL/DCS version
- Patroni version: 3.1.0
- PostgreSQL version: 13.0
- DCS (and its version): etcd3.5.9
Patroni configuration file
scope: postgres-cluster
namespace: /service/
name: ee_01
restapi:
listen: 192.168.61.105:8008
connect_address: 192.168.61.105:8008
etcd:
hosts: 192.168.61.105:2379,192.168.61.106:2379,192.168.61.107:2379
log:
level: INFO
traceback_level: INFO
dir: /home/postgres/patroni
file_num: 10
file_size: 104857600
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
master_start_timeout: 300
synchronous_mode: false
postgresql:
use_pg_rewind: true
parameters:
wal_level: replica
hot_standby: "on"
wal_keep_size: 100
max_wal_senders: 10
max_replication_slots: 10
wal_log_hints: "on"
archive_mode: "off"
archive_timeout: 1800s
logging_collector: "on"
postgresql:
database: postgres
listen: 0.0.0.0:5432
connect_address: 192.168.61.105:5432
bin_dir: /usr/local/pgsql/bin
data_dir: /usr/local/pgsql/data
pgpass: /home/postgres/tmp/.pgpass
authentication:
replication:
username: postgres
password: postgres
superuser:
username: postgres
password: postgres
rewind:
username: postgres
password: postgres
pg_hba:
- local all all trust
- host all all 0.0.0.0/0 trust
- host all all ::1/128 trust
- local replication all trust
- host replication all 0.0.0.0/0 trust
- host replication all ::1/128 trust
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
patronictl show-config
failsafe_mode: true
loop_wait: 10
master_start_timeout: 300
maximum_lag_on_failover: 1048576
postgresql:
parameters:
archive_command: test ! -f /home/postgres/wal_archive/%f && cp %p /home/postgres/wal_archive/%f
archive_mode: 'on'
archive_timeout: 1800s
hot_standby: 'on'
logging_collector: 'on'
max_replication_slots: 10
wal_keep_size: 100
wal_level: logical
wal_log_hints: 'on'
use_pg_rewind: true
retry_timeout: 10
synchronous_mode: true
synchronous_node_count: 1
ttl: '30'
Patroni log files
From the patroni log, the list order changed when changing synchronous_node_count from 1 to 2. This caused some confusion. At the begining, it showed ['ee_03', 'ee_02'], because list(picked) was ['ee_03', 'ee_02'] as candidates. Then it showd ['ee_02', 'ee_03'], because list(allow_promote) was changed to ['ee_02', 'ee_03'] as sync_nodes. When sync_state and lsn are the same, if we use sync_priority as the third sort condition of ReplicaList, the list(allow_promote) will be ['ee_03', 'ee_02'] same as list(picked).
2024-05-09 16:12:23,737 INFO: Assigning synchronous standby status to ['ee_03']
2024-05-09 16:12:26,024 INFO: Synchronous standby status assigned to ['ee_03']
2024-05-09 16:12:26,080 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:12:33,723 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:12:43,765 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:12:48,987 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:12:59,085 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:13:08,985 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:13:19,042 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:13:28,938 INFO: Lock owner: ee_01; I am ee_01
**2024-05-09 16:13:28,992 INFO: Assigning synchronous standby status to ['ee_03', 'ee_02']
2024-05-09 16:13:31,303 INFO: Synchronous standby status assigned to ['ee_02', 'ee_03']**
2024-05-09 16:13:31,356 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:13:39,031 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:13:49,091 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:13:59,009 INFO: no action. I am (ee_01), the leader with the lock
2024-05-09 16:13:59,010 INFO: Lock owner: ee_01; I am ee_01
2024-05-09 16:13:59,059 INFO: Updating synchronous privilege temporarily from ['ee_02', 'ee_03'] to ['ee_02']
2024-05-09 16:13:59,106 INFO: Assigning synchronous standby status to ['ee_02']
2024-05-09 16:13:59,431 INFO: no action. I am (ee_01), the leader with the lock
PostgreSQL log files
From the postgresql log, ee_02 with lower priority remained when changing synchronous_node_count from 2 to 1. For the priority-based synchronous replication, the standbys with higher priority will be considered as sync and other standbys may be considered as potential. Although we set precise node number in synchronous_standby_names, it is more reasonable to keep original higher priority node in the list.
2024-05-09 16:12:23.865 CST [6977] LOG: parameter "synchronous_standby_names" changed to "ee_03"
**2024-05-09 16:12:23.975 CST [7013] LOG: standby "ee_03" is now a synchronous standby with priority 1**
2024-05-09 16:12:23.975 CST [7013] STATEMENT: START_REPLICATION SLOT "ee_03" 0/ED000000 TIMELINE 50
2024-05-09 16:12:49.111 CST [6977] LOG: received SIGHUP, reloading configuration files
2024-05-09 16:13:29.116 CST [6977] LOG: parameter "synchronous_standby_names" changed to "2 (ee_03,ee_02)"
**2024-05-09 16:13:29.235 CST [17473] LOG: standby "ee_02" is now a synchronous standby with priority 2**
2024-05-09 16:13:29.235 CST [17473] STATEMENT: START_REPLICATION SLOT "ee_02" 0/F3000000 TIMELINE 50
2024-05-09 16:13:59.235 CST [6977] LOG: received SIGHUP, reloading configuration files
2024-05-09 16:13:59.236 CST [6977] LOG: parameter "synchronous_standby_names" changed to "ee_02"
2024-05-09 16:13:59.550 CST [6977] LOG: received SIGHUP, reloading configuration files
Have you tried to use GitHub issue search?
- Yes
Anything else we need to know?
No response