Manual failover candidates
elodiefb opened this issue · comments
What happened?
Manual failover now lists all members as candidates including current leader.
[postgres@pghost5 patroni]$ patronictl -c patroni.yml failover
Current cluster topology
+ Cluster: postgres-cluster (7303378219179270632) ---+----+-----------+
| Member | Host | Role | State | TL | Lag in MB |
+-------------+----------------+---------+-----------+----+-----------+
| postgres_01 | 123.0.0.1 | Replica | streaming | 16 | 0 |
| postgres_02 | 123.0.0.2 | Leader | running | 16 | |
| postgres_03 | 123.0.0.3 | Replica | streaming | 16 | 0 |
+-------------+----------------+---------+-----------+----+-----------+
Candidate ['postgres_01', 'postgres_02', 'postgres_03'] []: postgres_02
Are you sure you want to failover cluster postgres-cluster, demoting current leader postgres_02? [y/N]: y
Failover failed, details: 503, Failover failed
How can we reproduce it (as minimally and precisely as possible)?
patronictl -c patroni.yml failover
What did you expect to happen?
I think it will be more reasonable to remove current leader from failover candidates list since it in fact cannot be a failover-to node, and even if it is selected as the candidate, it will be refused with error reported. So, listing it as a failover candidate only causes confusions from user's view - you told me it could be selected, but when i selected it, you refused...
Moreover, in previous versions current leader was NOT listed as a failover candidate, which i think is more acceptable from user's view.
Patroni/PostgreSQL/DCS version
- Patroni version: 3.2.0
- PostgreSQL version: 14.0
- DCS (and its version): etcd 3.5.9
Patroni configuration file
scope: postgres-cluster
namespace: /service/
name: postgres_01
restapi:
listen: 123.0.0.1:8008
connect_address: 123.0.0.1:8008
etcd:
hosts: 123.0.0.1:2379,123.0.0.2:2379,123.0.0.3:2379
log:
level: INFO
traceback_level: INFO
dir: /home/postgres/patroni
file_num: 10
file_size: 104857600
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
master_start_timeout: 300
synchronous_mode: false
postgresql:
use_pg_rewind: true
parameters:
listen_addresses: "*"
port: 5432
wal_level: replica
hot_standby: "on"
wal_keep_size: 100
max_wal_senders: 10
max_replication_slots: 10
wal_log_hints: "on"
archive_mode: "off"
archive_timeout: 1800s
#------------log---------------------#
logging_collector: on
log_destination: 'stderr'
log_truncate_on_rotation: on
log_checkpoints: on
log_connections: on
log_disconnections: on
log_error_verbosity: default
log_lock_waits: on
log_temp_files: 0
log_autovacuum_min_duration: 0
log_min_duration_statement: 50
log_timezone: 'PRC'
log_filename: postgresql-%Y-%m-%d_%H.log
log_line_prefix: '%t [%p]: db=%d,user=%u,app=%a,client=%h '
#-----------------------------------
postgresql:
database: postgres
listen: 0.0.0.0:5432
connect_address: 123.0.0.1:5432
bin_dir: /usr/local/pgsql/bin
data_dir: /usr/local/pgsql/data
pgpass: /home/postgres/tmp/.pgpass
authentication:
replication:
username: postgres
password: postgres
superuser:
username: postgres
password: postgres
rewind:
username: postgres
password: postgres
pg_hba:
- local all all trust
- host all all 0.0.0.0/0 trust
- host all all ::1/128 trust
- local replication all trust
- host replication all 0.0.0.0/0 trust
- host replication all ::1/128 trust
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
patronictl show-config
loop_wait: 10
master_start_timeout: 300
maximum_lag_on_failover: 1048576
postgresql:
parameters:
archive_mode: 'off'
archive_timeout: 1800s
hot_standby: 'on'
listen_addresses: '*'
log_autovacuum_min_duration: 0
log_checkpoints: true
log_connections: true
log_destination: stderr
log_disconnections: true
log_error_verbosity: default
log_filename: postgresql-%Y-%m-%d_%H.log
log_line_prefix: '%t [%p]: db=%d,user=%u,app=%a,client=%h '
log_lock_waits: true
log_min_duration_statement: 50
log_temp_files: 0
log_timezone: PRC
log_truncate_on_rotation: true
logging_collector: true
max_replication_slots: 10
max_wal_senders: 10
port: 5432
wal_keep_size: 100
wal_level: replica
wal_log_hints: 'on'
use_pg_rewind: true
retry_timeout: 10
synchronous_mode: false
ttl: 30
Patroni log files
2023-12-12 17:07:08,054 INFO: no action. I am (postgres_02), the leader with the lock
2023-12-12 17:07:18,046 INFO: no action. I am (postgres_02), the leader with the lock
2023-12-12 17:07:18,775 INFO: received failover request with leader=None candidate=postgres_02 scheduled_at=None
2023-12-12 17:07:18,778 INFO: Got response from postgres_02 http://123.0.0.2:8008/patroni: {"state": "running", "postmaster_start_time": "2023-12-12 16:09:53.667173+08:00", "role": "master", "server_version": 140000, "xlog": {"location": 268435456}, "timeline": 16, "replication": [{"usename": "postgres", "application_name": "postgres_01", "client_addr": "123.0.0.1", "state": "streaming", "sync_state": "async", "sync_priority": 0}, {"usename": "postgres", "application_name": "postgres_03", "client_addr": "123.0.0.3", "state": "streaming", "sync_state": "async", "sync_priority": 0}], "dcs_last_seen": 1702372038, "database_system_identifier": "7303378219179270632", "patroni": {"version": "3.2.0", "scope": "postgres-cluster", "name": "postgres_02"}}
2023-12-12 17:07:18,783 INFO: Lock owner: postgres_02; I am postgres_02
2023-12-12 17:07:18,786 WARNING: manual failover: I am already the leader, no need to failover
2023-12-12 17:07:18,786 INFO: Cleaning up failover key
2023-12-12 17:07:18,791 INFO: no action. I am (postgres_02), the leader with the lock
2023-12-12 17:07:28,791 INFO: no action. I am (postgres_02), the leader with the lock
2023-12-12 17:07:38,791 INFO: no action. I am (postgres_02), the leader with the lock
PostgreSQL log files
N/A
Have you tried to use GitHub issue search?
- Yes
Anything else we need to know?
No response