failsafe_mode don't work when k8s return 409
ChenChangAo opened this issue · comments
ChenChangAo commented
What happened?
failsafe_mode don't work when k8s return 409
How can we reproduce it (as minimally and precisely as possible)?
I‘m not sure, maybe k8s is overload
What did you expect to happen?
failsafe_mode could work when k8s return 409
Patroni/PostgreSQL/DCS version
- Patroni version: 3.1.0
- PostgreSQL version:
- DCS (and its version): k8s
Patroni configuration file
ttl: 30
loop_wait: 10
retry_timeout: 10
failsafe_mode: true
patronictl show-config
no need
Patroni log files
2024-02-18 07:15:35,544 INFO: no action. I am (node0), the leader with the lock
2024-02-18 07:15:41,360 INFO: Lock owner: node0; I am node0
2024-02-18 07:15:46,373 ERROR: Request to server https://10.59.230.148:443 failed: ReadTimeoutError("HTTPSConnectionPool(host='10.59.230.148', port=443): Read timed out. (read timeout=4.999478869140148)",)
2024-02-18 07:15:49,314 ERROR: Request to server https://10.59.230.148:443 failed: ReadTimeoutError("HTTPSConnectionPool(host='10.59.230.148', port=443): Read timed out. (read timeout=2.045989267528057)",)
2024-02-18 07:15:51,369 ERROR: Request to server https://10.59.230.148:443 failed: ReadTimeoutError("HTTPSConnectionPool(host='10.59.230.148', port=443): Read timed out. (read timeout=1.631961651146412)",)
2024-02-18 07:15:51,369 ERROR: Error communicating with DCS
2024-02-18 07:15:51,377 INFO: Got response from node1 http://10.59.7.15:8009/patroni: Accepted
2024-02-18 07:15:51,471 INFO: continue to run as a leader because failsafe mode is enabled and all members are accessible
2024-02-18 07:15:51,473 WARNING: Loop time exceeded, rescheduling immediately.
2024-02-18 07:15:51,474 INFO: Lock owner: node0; I am node0
2024-02-18 07:15:56,485 ERROR: Request to server https://10.59.230.148:443 failed: ReadTimeoutError("HTTPSConnectionPool(host='10.59.230.148', port=443): Read timed out. (read timeout=4.989643476903439)",)
2024-02-18 07:15:59,227 ERROR: Request to server https://10.59.230.148:443 failed: ReadTimeoutError("HTTPSConnectionPool(host='10.59.230.148', port=443): Read timed out. (read timeout=2.2473123595118523)",)
2024-02-18 07:15:59,876 WARNING: Concurrent update of node-leader
2024-02-18 07:16:00,998 ERROR: failed to update leader lock
2024-02-18 07:16:00,998 INFO: Demoting self (immediate-nolock)
2024-02-18 07:16:03,956 INFO: demoted self because failed to update leader lock in DCS
PostgreSQL log files
no need
Have you tried to use GitHub issue search?
- Yes
Anything else we need to know?
No response
Alexander Kukushkin commented
409 is a concurrent update.
That is, K8s API is up and running and someone else updated the leader object.
failsafe_mode
can't help with it.