Conflicting configuration between nofailover: False and failover_priority: 0. Defaulting to nofailover: False

Question

Conflicting configuration between nofailover: False and failover_priority: 0. Defaulting to nofailover: False

elodiefb opened this issue 5 months ago · comments

What happened?

When both "nofailover: False" and "failover_priority: 0" are configured in patroni.yml, it will warn me "Conflicting configuration between nofailover: False and failover_priority: 0. Defaulting to nofailover: False", which means "nofailover: False" takes effect, that is to say, this node CAN be a failover candidate, right?

Below both postgres_01 and postgres_03 were configured with both "nofailover: False" and "failover_priority: 0" :

2023-12-14 15:58:36,235 - WARNING - Conflicting configuration between nofailover: False and failover_priority: 0. **Defaulting to nofailover: False**
+ Cluster: postgres-cluster (7303378219179270632) ---+----+-----------+----------------------+
| Member      | Host           | Role    | State     | TL | Lag in MB | Tags                 |
+-------------+----------------+---------+-----------+----+-----------+----------------------+
| postgres_01 | 123.0.0.1 | Replica | streaming | 20 |         0 | failover_priority: 0 |
| postgres_02 | 123.0.0.2 | Leader  | running   | 20 |           |                      |
| postgres_03 | 123.0.0.3 | Replica | streaming | 20 |         0 | failover_priority: 0 |
+-------------+----------------+---------+-----------+----+-----------+----------------------+

When current leader(postgres_02) was down, postgres_01 was not promoted, nor postgres_03，while one of them was supposed to be promoted according to above addressed warning information:

[root@sophia-pghost5 patroni]# patronictl -c patroni.yml list
2023-12-14 16:15:30,227 - WARNING - Conflicting configuration between nofailover: False and failover_priority: 0. **Defaulting to nofailover: False**
+ Cluster: postgres-cluster (7303378219179270632) -+----+-----------+----------------------+
| Member      | Host           | Role    | State   | TL | Lag in MB | Tags                 |
+-------------+----------------+---------+---------+----+-----------+----------------------+
| postgres_01 | 123.0.0.1 | Replica | running | 20 |         0 | failover_priority: 0 |
| postgres_03 | 123.0.0.3 | Replica | running | 20 |         0 | failover_priority: 0 |
+-------------+----------------+---------+---------+----+-----------+----------------------+

How can we reproduce it (as minimally and precisely as possible)?

Both replica nodes configure with both "nofailover: False" and "failover_priority: 0", current leader uses default configuration.
On current leader, systemctl stop patroni.

What did you expect to happen?

According to the warning information, I just supposed one of the two replica nodes will be promoted as new leader.

Patroni/PostgreSQL/DCS version

Patroni version: 3.2.0
PostgreSQL version: 14.0
DCS (and its version): etcd 3.5.9

Patroni configuration file

scope: postgres-cluster
namespace: /service/
name: postgres_01

restapi:
  listen: 123.0.0.1:8008
  connect_address: 123.0.0.1:8008

etcd:
  hosts: 123.0.0.1:2379,123.0.0.2:2379,123.0.0.3:2379

log:
  level: INFO
  traceback_level: INFO
  dir: /home/postgres/patroni
  file_num: 10
  file_size: 104857600

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    master_start_timeout: 300
    synchronous_mode: false
    postgresql:
      use_pg_rewind: true
      parameters:
        listen_addresses: "*"
        port: 5432
        wal_level: replica
        hot_standby: "on"
        wal_keep_size: 100
        max_wal_senders: 10
        max_replication_slots: 10
        wal_log_hints: "on"
        archive_mode: "off"
        archive_timeout: 1800s
 #------------log---------------------#
        logging_collector: on
        log_destination: 'stderr'
        log_truncate_on_rotation: on
        log_checkpoints: on
        log_connections: on
        log_disconnections: on
        log_error_verbosity: default
        log_lock_waits: on
        log_temp_files: 0
        log_autovacuum_min_duration: 0
        log_min_duration_statement: 50
        log_timezone: 'PRC'
        log_filename: postgresql-%Y-%m-%d_%H.log
        log_line_prefix: '%t [%p]: db=%d,user=%u,app=%a,client=%h '
#-----------------------------------

postgresql:
  database: postgres
  listen: 0.0.0.0:5432
  connect_address: 123.0.0.1:5432
  bin_dir: /usr/local/pgsql/bin
  data_dir: /usr/local/pgsql/data
  pgpass: /home/postgres/tmp/.pgpass

  authentication:
    replication:
      username: postgres
      password: postgres
    superuser:
      username: postgres
      password: postgres
    rewind:
      username: postgres
      password: postgres

  pg_hba:
  - local   all             all                                     trust
  - host    all             all             0.0.0.0/0               trust
  - host    all             all             ::1/128                 trust
  - local   replication     all                                     trust
  - host    replication     all             0.0.0.0/0               trust
  - host    replication     all             ::1/128                 trust

tags:
    nofailover: false
    failover_priority: 0
    noloadbalance: false
    clonefrom: false
    nosync: false

patronictl show-config

2023-12-14 16:48:04,536 - WARNING - Conflicting configuration between nofailover: False and failover_priority: 0. Defaulting to nofailover: False
loop_wait: 10
master_start_timeout: 300
maximum_lag_on_failover: 1048576
postgresql:
  parameters:
    archive_mode: 'off'
    archive_timeout: 1800s
    hot_standby: 'on'
    listen_addresses: '*'
    log_autovacuum_min_duration: 0
    log_checkpoints: true
    log_connections: true
    log_destination: stderr
    log_disconnections: true
    log_error_verbosity: default
    log_filename: postgresql-%Y-%m-%d_%H.log
    log_line_prefix: '%t [%p]: db=%d,user=%u,app=%a,client=%h '
    log_lock_waits: true
    log_min_duration_statement: 50
    log_temp_files: 0
    log_timezone: PRC
    log_truncate_on_rotation: true
    logging_collector: true
    max_replication_slots: 10
    max_wal_senders: 10
    port: 5432
    wal_keep_size: 100
    wal_level: replica
    wal_log_hints: 'on'
  use_pg_rewind: true
retry_timeout: 10
synchronous_mode: false
ttl: 30

Patroni log files

2023-12-14 16:12:10,513 INFO: no action. I am (postgres_01), a secondary, and following a leader (postgres_02)
2023-12-14 16:12:20,513 INFO: no action. I am (postgres_01), a secondary, and following a leader (postgres_02)
2023-12-14 16:12:23,396 INFO: following a different leader because I am not allowed to promote
2023-12-14 16:12:33,293 INFO: following a different leader because I am not allowed to promote

2023-12-14 16:12:10,520 INFO: no action. I am (postgres_03), a secondary, and following a leader (postgres_02)
2023-12-14 16:12:20,520 INFO: no action. I am (postgres_03), a secondary, and following a leader (postgres_02)
2023-12-14 16:12:23,461 INFO: following a different leader because I am not allowed to promote
2023-12-14 16:12:33,296 INFO: following a different leader because I am not allowed to promote

PostgreSQL log files

2023-12-14 16:00:38 CST [1346]: db=,user=,app=,client= LOG:  restartpoint starting: time
2023-12-14 16:00:38 CST [1346]: db=,user=,app=,client= LOG:  restartpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.004 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16384 kB, estimate=16384 kB
2023-12-14 16:00:38 CST [1346]: db=,user=,app=,client= LOG:  recovery restart point at 0/16000060
2023-12-14 16:12:21 CST [1475]: db=,user=,app=,client= FATAL:  could not receive data from WAL stream: FATAL:  terminating connection due to administrator command
2023-12-14 16:12:21 CST [1342]: db=,user=,app=,client= LOG:  record with incorrect prev-link 2D60000/0 at 0/16000148
2023-12-14 16:12:21 CST [18178]: db=,user=,app=,client= FATAL:  could not connect to the primary server: connection to server at "123.0.0.2", port 5432 failed: FATAL:  the database system is shutting down
2023-12-14 16:12:23 CST [1339]: db=,user=,app=,client= LOG:  received SIGHUP, reloading configuration files
2023-12-14 16:12:23 CST [1339]: db=,user=,app=,client= LOG:  parameter "primary_conninfo" removed from configuration file, reset to default
2023-12-14 16:12:23 CST [1339]: db=,user=,app=,client= LOG:  parameter "primary_slot_name" removed from configuration file, reset to default
2023-12-14 16:12:23 CST [18230]: db=[unknown],user=[unknown],app=[unknown],client=127.0.0.1 LOG:  connection received: host=127.0.0.1 port=55676
2023-12-14 16:12:23 CST [18230]: db=[unknown],user=postgres,app=[unknown],client=127.0.0.1 LOG:  replication connection authorized: user=postgres
2023-12-14 16:12:23 CST [18230]: db=[unknown],user=postgres,app=[unknown],client=127.0.0.1 LOG:  disconnection: session time: 0:00:00.001 user=postgres database= host=127.0.0.1 port=55676
2023-12-14 16:12:33 CST [18283]: db=[unknown],user=[unknown],app=[unknown],client=127.0.0.1 LOG:  connection received: host=127.0.0.1 port=55694
2023-12-14 16:12:33 CST [18283]: db=[unknown],user=postgres,app=[unknown],client=127.0.0.1 LOG:  replication connection authorized: user=postgres
2023-12-14 16:12:33 CST [18283]: db=[unknown],user=postgres,app=[unknown],client=127.0.0.1 LOG:  disconnection: session time: 0:00:00.001 user=postgres database= host=127.0.0.1 port=55694
2023-12-14 16:12:43 CST [18294]: db=[unknown],user=[unknown],app=[unknown],client=127.0.0.1 LOG:  connection received: host=127.0.0.1 port=55706
2023-12-14 16:12:43 CST [18294]: db=[unknown],user=postgres,app=[unknown],client=127.0.0.1 LOG:  replication connection authorized: user=postgres
2023-12-14 16:12:43 CST [18294]: db=[unknown],user=postgres,app=[unknown],client=127.0.0.1 LOG:  disconnection: session time: 0:00:00.001 user=postgres database= host=127.0.0.1 port=55706

Have you tried to use GitHub issue search?

Yes

Anything else we need to know?

No response