Conflicting configuration between nofailover: False and failover_priority: 0. Defaulting to nofailover: False
elodiefb opened this issue · comments
What happened?
When both "nofailover: False" and "failover_priority: 0" are configured in patroni.yml, it will warn me "Conflicting configuration between nofailover: False and failover_priority: 0. Defaulting to nofailover: False", which means "nofailover: False" takes effect, that is to say, this node CAN be a failover candidate, right?
Below both postgres_01 and postgres_03 were configured with both "nofailover: False" and "failover_priority: 0" :
2023-12-14 15:58:36,235 - WARNING - Conflicting configuration between nofailover: False and failover_priority: 0. **Defaulting to nofailover: False**
+ Cluster: postgres-cluster (7303378219179270632) ---+----+-----------+----------------------+
| Member | Host | Role | State | TL | Lag in MB | Tags |
+-------------+----------------+---------+-----------+----+-----------+----------------------+
| postgres_01 | 123.0.0.1 | Replica | streaming | 20 | 0 | failover_priority: 0 |
| postgres_02 | 123.0.0.2 | Leader | running | 20 | | |
| postgres_03 | 123.0.0.3 | Replica | streaming | 20 | 0 | failover_priority: 0 |
+-------------+----------------+---------+-----------+----+-----------+----------------------+
When current leader(postgres_02) was down, postgres_01 was not promoted, nor postgres_03,while one of them was supposed to be promoted according to above addressed warning information:
[root@sophia-pghost5 patroni]# patronictl -c patroni.yml list
2023-12-14 16:15:30,227 - WARNING - Conflicting configuration between nofailover: False and failover_priority: 0. **Defaulting to nofailover: False**
+ Cluster: postgres-cluster (7303378219179270632) -+----+-----------+----------------------+
| Member | Host | Role | State | TL | Lag in MB | Tags |
+-------------+----------------+---------+---------+----+-----------+----------------------+
| postgres_01 | 123.0.0.1 | Replica | running | 20 | 0 | failover_priority: 0 |
| postgres_03 | 123.0.0.3 | Replica | running | 20 | 0 | failover_priority: 0 |
+-------------+----------------+---------+---------+----+-----------+----------------------+
How can we reproduce it (as minimally and precisely as possible)?
- Both replica nodes configure with both "nofailover: False" and "failover_priority: 0", current leader uses default configuration.
- On current leader, systemctl stop patroni.
What did you expect to happen?
According to the warning information, I just supposed one of the two replica nodes will be promoted as new leader.
Patroni/PostgreSQL/DCS version
- Patroni version: 3.2.0
- PostgreSQL version: 14.0
- DCS (and its version): etcd 3.5.9
Patroni configuration file
scope: postgres-cluster
namespace: /service/
name: postgres_01
restapi:
listen: 123.0.0.1:8008
connect_address: 123.0.0.1:8008
etcd:
hosts: 123.0.0.1:2379,123.0.0.2:2379,123.0.0.3:2379
log:
level: INFO
traceback_level: INFO
dir: /home/postgres/patroni
file_num: 10
file_size: 104857600
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
master_start_timeout: 300
synchronous_mode: false
postgresql:
use_pg_rewind: true
parameters:
listen_addresses: "*"
port: 5432
wal_level: replica
hot_standby: "on"
wal_keep_size: 100
max_wal_senders: 10
max_replication_slots: 10
wal_log_hints: "on"
archive_mode: "off"
archive_timeout: 1800s
#------------log---------------------#
logging_collector: on
log_destination: 'stderr'
log_truncate_on_rotation: on
log_checkpoints: on
log_connections: on
log_disconnections: on
log_error_verbosity: default
log_lock_waits: on
log_temp_files: 0
log_autovacuum_min_duration: 0
log_min_duration_statement: 50
log_timezone: 'PRC'
log_filename: postgresql-%Y-%m-%d_%H.log
log_line_prefix: '%t [%p]: db=%d,user=%u,app=%a,client=%h '
#-----------------------------------
postgresql:
database: postgres
listen: 0.0.0.0:5432
connect_address: 123.0.0.1:5432
bin_dir: /usr/local/pgsql/bin
data_dir: /usr/local/pgsql/data
pgpass: /home/postgres/tmp/.pgpass
authentication:
replication:
username: postgres
password: postgres
superuser:
username: postgres
password: postgres
rewind:
username: postgres
password: postgres
pg_hba:
- local all all trust
- host all all 0.0.0.0/0 trust
- host all all ::1/128 trust
- local replication all trust
- host replication all 0.0.0.0/0 trust
- host replication all ::1/128 trust
tags:
nofailover: false
failover_priority: 0
noloadbalance: false
clonefrom: false
nosync: false
patronictl show-config
2023-12-14 16:48:04,536 - WARNING - Conflicting configuration between nofailover: False and failover_priority: 0. Defaulting to nofailover: False
loop_wait: 10
master_start_timeout: 300
maximum_lag_on_failover: 1048576
postgresql:
parameters:
archive_mode: 'off'
archive_timeout: 1800s
hot_standby: 'on'
listen_addresses: '*'
log_autovacuum_min_duration: 0
log_checkpoints: true
log_connections: true
log_destination: stderr
log_disconnections: true
log_error_verbosity: default
log_filename: postgresql-%Y-%m-%d_%H.log
log_line_prefix: '%t [%p]: db=%d,user=%u,app=%a,client=%h '
log_lock_waits: true
log_min_duration_statement: 50
log_temp_files: 0
log_timezone: PRC
log_truncate_on_rotation: true
logging_collector: true
max_replication_slots: 10
max_wal_senders: 10
port: 5432
wal_keep_size: 100
wal_level: replica
wal_log_hints: 'on'
use_pg_rewind: true
retry_timeout: 10
synchronous_mode: false
ttl: 30
Patroni log files
2023-12-14 16:12:10,513 INFO: no action. I am (postgres_01), a secondary, and following a leader (postgres_02)
2023-12-14 16:12:20,513 INFO: no action. I am (postgres_01), a secondary, and following a leader (postgres_02)
2023-12-14 16:12:23,396 INFO: following a different leader because I am not allowed to promote
2023-12-14 16:12:33,293 INFO: following a different leader because I am not allowed to promote
2023-12-14 16:12:10,520 INFO: no action. I am (postgres_03), a secondary, and following a leader (postgres_02)
2023-12-14 16:12:20,520 INFO: no action. I am (postgres_03), a secondary, and following a leader (postgres_02)
2023-12-14 16:12:23,461 INFO: following a different leader because I am not allowed to promote
2023-12-14 16:12:33,296 INFO: following a different leader because I am not allowed to promote
PostgreSQL log files
2023-12-14 16:00:38 CST [1346]: db=,user=,app=,client= LOG: restartpoint starting: time
2023-12-14 16:00:38 CST [1346]: db=,user=,app=,client= LOG: restartpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.004 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16384 kB, estimate=16384 kB
2023-12-14 16:00:38 CST [1346]: db=,user=,app=,client= LOG: recovery restart point at 0/16000060
2023-12-14 16:12:21 CST [1475]: db=,user=,app=,client= FATAL: could not receive data from WAL stream: FATAL: terminating connection due to administrator command
2023-12-14 16:12:21 CST [1342]: db=,user=,app=,client= LOG: record with incorrect prev-link 2D60000/0 at 0/16000148
2023-12-14 16:12:21 CST [18178]: db=,user=,app=,client= FATAL: could not connect to the primary server: connection to server at "123.0.0.2", port 5432 failed: FATAL: the database system is shutting down
2023-12-14 16:12:23 CST [1339]: db=,user=,app=,client= LOG: received SIGHUP, reloading configuration files
2023-12-14 16:12:23 CST [1339]: db=,user=,app=,client= LOG: parameter "primary_conninfo" removed from configuration file, reset to default
2023-12-14 16:12:23 CST [1339]: db=,user=,app=,client= LOG: parameter "primary_slot_name" removed from configuration file, reset to default
2023-12-14 16:12:23 CST [18230]: db=[unknown],user=[unknown],app=[unknown],client=127.0.0.1 LOG: connection received: host=127.0.0.1 port=55676
2023-12-14 16:12:23 CST [18230]: db=[unknown],user=postgres,app=[unknown],client=127.0.0.1 LOG: replication connection authorized: user=postgres
2023-12-14 16:12:23 CST [18230]: db=[unknown],user=postgres,app=[unknown],client=127.0.0.1 LOG: disconnection: session time: 0:00:00.001 user=postgres database= host=127.0.0.1 port=55676
2023-12-14 16:12:33 CST [18283]: db=[unknown],user=[unknown],app=[unknown],client=127.0.0.1 LOG: connection received: host=127.0.0.1 port=55694
2023-12-14 16:12:33 CST [18283]: db=[unknown],user=postgres,app=[unknown],client=127.0.0.1 LOG: replication connection authorized: user=postgres
2023-12-14 16:12:33 CST [18283]: db=[unknown],user=postgres,app=[unknown],client=127.0.0.1 LOG: disconnection: session time: 0:00:00.001 user=postgres database= host=127.0.0.1 port=55694
2023-12-14 16:12:43 CST [18294]: db=[unknown],user=[unknown],app=[unknown],client=127.0.0.1 LOG: connection received: host=127.0.0.1 port=55706
2023-12-14 16:12:43 CST [18294]: db=[unknown],user=postgres,app=[unknown],client=127.0.0.1 LOG: replication connection authorized: user=postgres
2023-12-14 16:12:43 CST [18294]: db=[unknown],user=postgres,app=[unknown],client=127.0.0.1 LOG: disconnection: session time: 0:00:00.001 user=postgres database= host=127.0.0.1 port=55706
Have you tried to use GitHub issue search?
- Yes
Anything else we need to know?
No response