Master Postgres Pod Was Out Of Memory - Postgres Was Inaccessible.
avi1818 opened this issue · comments
What happened?
Hi,
Master Postgres pod was out of memory and as a result Postgres was inaccessible. I thought that Patroni would either promote one of the slaves or restart the pod that was master.
What is the expected behavior in such a case?
kernel: postgres invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=993
kernel: [] oom_kill_process+0x2cd/0x490
kernel: Task in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7eab2e7a_917c_4996_bb0f_0ccbb75719ce.slice/docker-df58adbf214854f03fbdef15ac336f1b319038a93995a993a43abe286bbca815.scope killed as a result of limit of /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod7eab2e7a_917c_4996_bb0f_0ccbb75719ce.slice
kernel: Memory cgroup out of memory: Kill process 23316 (postgres) score 1033 or sacrifice child
kernel: Killed process 23316 (postgres), UID 101, total-vm:8221844kB, anon-rss:1195384kB, file-rss:19160kB, shmem-rss:112164kB
abrt-hook-ccpp: Process 438 (postgres) of user 101 killed by SIGABRT - dumping core
How can we reproduce it (as minimally and precisely as possible)?
none
What did you expect to happen?
none
Patroni/PostgreSQL/DCS version
- Patroni version: 2.0.1
- PostgreSQL version: 13
- DCS (and its version):
Patroni configuration file
none
patronictl show-config
2024-02-29 19:23:51,905 - WARNING - Kubernetes RBAC doesn't allow GET access to the 'kubernetes' endpoint in the 'default' namespace. Disabling 'bypass_api_service'.
loop_wait: 10
maximum_lag_on_failover: 33554432
postgresql:
parameters:
archive_mode: false
archive_timeout: 1800s
autovacuum_analyze_scale_factor: 0.02
autovacuum_max_workers: 5
autovacuum_vacuum_scale_factor: 0.05
checkpoint_completion_target: 0.9
hot_standby: 'on'
log_autovacuum_min_duration: 0
log_checkpoints: 'on'
log_connections: 'on'
log_destination: stderr
log_disconnections: 'on'
log_line_prefix: '%t [%p]: [%l-1] %c %x %d %u %a %h '
log_lock_waits: 'on'
log_min_duration_statement: 500
log_statement: ddl
log_temp_files: 0
logging_collector: false
max_connections: 533
max_logical_replication_workers: 90
max_replication_slots: 90
max_slot_wal_keep_size: 5000
max_wal_senders: 90
max_worker_processes: '90'
tcp_keepalives_idle: 900
tcp_keepalives_interval: 100
track_commit_timestamp: 'on'
track_functions: all
wal_level: logical
wal_log_hints: 'on'
use_pg_rewind: true
use_slots: true
retry_timeout: 10
ttl: 30
Patroni log files
none
PostgreSQL log files
none
Have you tried to use GitHub issue search?
- Yes
Anything else we need to know?
none
I thought that Patroni would either promote one of the slaves or restart the pod that was master.
There are no slaves in the PostgreSQL world. What would Patroni do depends on many things. By default it will start the failed postgres up. And I am quite confident that it did that, the proof should be in Patroni logs, but you didn't bother to check/provide them.
Patroni version: 2.0.1
This is veeeery old version, please update to the latest (3.2.2 atm) ASAP.
In general, OOM is not a Patroni problem. It is your task to give Patroni/Postgres enough resources to work.