patronictl -c /etc/patroni/~postgres-operator_cluster.yaml restart cluster is reverting the log_level configuration
desorenp opened this issue · comments
What happened?
Hi Team,
I want to configure wal log level to minimal on OCP environment , where i have Cruchypostgres operator 5.5.0 installed with PG14.
The cluster is configured with primary and secondary replicas.
As per my observation, whatever log_level i try to set to, when the cluster is created , it comes up with log_level: logical
I tried to edit-config in the primary instanceset , which reverts back to logical after cluster restart.
I tried to configure log_level to replica , which also does not seems to work, it reverts back to logical.
Below is a snippet from show-config
patronictl -c /etc/patroni/~postgres-operator_cluster.yaml show-config
wal_keep_segments: 50
wal_level: logical
work_mem: 524kB
patronictl -c /etc/patroni/~postgres-operator_cluster.yaml restart cluster-ha
I see below logs in primary instance logs
**```
2024-01-05 10:34:10,222 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:10,223 WARNING: postgresql parameter wal_level=minimal failed validation, defaulting to hot_standby
2024-01-05 10:34:10,228 INFO: Changed huge_pages from off to False (restart might be required)
2024-01-05 10:34:10,228 INFO: Changed tcp_keepalives_count from 0 to 9
2024-01-05 10:34:10,228 INFO: Changed tcp_keepalives_idle from 0 to 7200
2024-01-05 10:34:10,228 INFO: Changed tcp_keepalives_interval from 0 to 75
2024-01-05 10:34:10,228 INFO: Changed wal_level from logical to replica (restart might be required)
2024-01-05 10:34:10,709 INFO: Reloading PostgreSQL configuration.
server signaled
2024-01-05 10:34:20,229 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:20,830 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:20,834 INFO: Changed huge_pages from off to False (restart might be required)
2024-01-05 10:34:20,834 INFO: Changed tcp_keepalives_count from 0 to 9
2024-01-05 10:34:20,834 INFO: Changed tcp_keepalives_idle from 0 to 7200
2024-01-05 10:34:20,834 INFO: Changed tcp_keepalives_interval from 0 to 75
2024-01-05 10:34:20,834 WARNING: Removing invalid parameter `hugepages-2Mi` from postgresql.parameters
2024-01-05 10:34:21,409 INFO: Reloading PostgreSQL configuration.
server signaled
2024-01-05 10:34:21,442 INFO: Lock owner: flowone-cluster-flowone-db-instanceset-tv2m-0; I am flowone-cluster-flowone-db-instanceset-tv2m-0
2024-01-05 10:34:21,888 INFO: closed patroni connection to the postgresql cluster
2024-01-05 10:34:22,413 INFO: establishing a new patroni connection to the postgres cluster
2024-01-05 10:34:22.788 UTC [106976] LOG: pgaudit extension initialized
2024-01-05 10:34:22,788 INFO: postmaster pid=106976
/tmp/postgres:5432 - no response
2024-01-05 10:34:22,815 INFO: establishing a new patroni connection to the postgres cluster
2024-01-05 10:34:22.877 UTC [106976] LOG: redirecting log output to logging collector process
2024-01-05 10:34:22.877 UTC [106976] HINT: Future log output will appear in directory "log".
2024-01-05 10:34:23,185 INFO: establishing a new patroni connection to the postgres cluster
/tmp/postgres:5432 - accepting connections
/tmp/postgres:5432 - accepting connections
2024-01-05 10:34:23,954 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:25,162 INFO: not proceeding with the restart: pending restart flag is not set
2024-01-05 10:34:33,948 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:43,945 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
```**
Can you please suggest the correct procedure to disable log_level as we are seeing an increase in pg_wal volumes .
How can we reproduce it (as minimally and precisely as possible)?
Create a DB cluster with primary and secondary replica with log_level: logical
Try to update the config to replica or minimal.
Restart cluster .
What did you expect to happen?
I
Patroni/PostgreSQL/DCS version
- Patroni version: 5.5.0
- PostgreSQL version: 14.6
- DCS (and its version):
Patroni configuration file
---
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: cluster
labels:
app: postgresql
version: "14"
namespace: fo-ns
spec:
port: 5432
postgresVersion: 14
openshift: true
monitoring:
pgmonitor:
exporter:
image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.3.1-0
backups:
pgbackrest:
global:
repo1-retention-archive: "1"
repo1-retention-archive-type: full
repo1-retention-full: "1"
repo1-retention-full-type: count
repos:
- name: repo1
schedules:
full: "* */6 * * *"
volume:
volumeClaimSpec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 200Gi
storageClassName: ocs-storagecluster-cephfs
users:
- name: postgres
- databases:
- ilinkdb
name: ilink
- databases:
- wfcdb
name: wfc
- databases:
- batchapidb
name: batchapi
- databases:
- catalogdb
name: catalog
- databases:
- eventmanagementdb
name: eventmanagement
- databases:
- archivedb
name: archive
proxy:
pgBouncer:
replicas: 1
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
postgres-operator.crunchydata.com/cluster: cluster
postgres-operator.crunchydata.com/role: pgbouncer
instances:
- name: fl-db-instanceset
replicas: 2
resources:
limits:
cpu: '6'
memory: 12Gi
requests:
cpu: '4'
memory: 10Gi
dataVolumeClaimSpec:
storageClassName: ocs-storagecluster-cephfs
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Gi
walVolumeClaimSpec:
storageClassName: ocs-storagecluster-cephfs
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Gi
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
postgres-operator.crunchydata.com/cluster: cluster
postgres-operator.crunchydata.com/instance-set: instanceset
patroni:
dynamicConfiguration:
postgresql:
parameters:
TimeZone: UTC
max_connections: 2000
tcp_keepalives_idle: 7200
tcp_keepalives_count: 9
tcp_keepalives_interval: 75
idle_in_transaction_session_timeout: 30min
shared_preload_libraries: pgaudit,pg_stat_statements
track_activity_query_size: 2048
pg_stat_statements.track: all
huge_pages: off
hugepages-2Mi: 0
shared_buffers: 4GB
effective_cache_size: 12GB
maintenance_work_mem: 1GB
checkpoint_completion_target: 0.9
wal_buffers: 16MB
default_statistics_target: 100
random_page_cost: 1.1
effective_io_concurrency: 200
work_mem: 524kB
min_wal_size: 2GB
max_wal_size: 8GB
max_worker_processes: 10
max_parallel_workers_per_gather: 4
max_parallel_workers: 10
max_parallel_maintenance_workers: 4
log_autovacuum_min_duration: 1min
autovacuum_freeze_max_age: 1000000000
autovacuum_multixact_freeze_max_age: 1000000000
autovacuum_max_workers: 10
autovacuum_naptime: 5s
autovacuum_vacuum_cost_delay: 10ms
autovacuum_vacuum_cost_limit: 1000
vacuum_freeze_min_age: 1000000000
vacuum_freeze_table_age: 1000000000
wal_level: minimal
wal_keep_segments: 50
vacuum_multixact_freeze_min_age: 5000000
vacuum_multixact_freeze_table_age: 150000000
patronictl show-config
sh-4.4$ patronictl -c /etc/patroni/~postgres-operator_cluster.yaml show-config
loop_wait: 10
postgresql:
parameters:
TimeZone: UTC
archive_command: pgbackrest --stanza=db archive-push "%p"
archive_mode: 'on'
archive_timeout: 60s
autovacuum_freeze_max_age: 1000000000
autovacuum_max_workers: 10
autovacuum_multixact_freeze_max_age: 1000000000
autovacuum_naptime: 5s
autovacuum_vacuum_cost_delay: 10ms
autovacuum_vacuum_cost_limit: 1000
checkpoint_completion_target: 0.9
default_statistics_target: 100
effective_cache_size: 12GB
effective_io_concurrency: 200
huge_pages: false
hugepages-2Mi: 0
idle_in_transaction_session_timeout: 30min
jit: 'off'
log_autovacuum_min_duration: 1min
maintenance_work_mem: 1GB
max_connections: 2000
max_parallel_maintenance_workers: 4
max_parallel_workers: 10
max_parallel_workers_per_gather: 4
max_wal_size: 8GB
max_worker_processes: 10
min_wal_size: 2GB
password_encryption: scram-sha-256
pg_stat_statements.track: all
pgnodemx.kdapi_path: /etc/database-containerinfo
random_page_cost: 1.1
restore_command: pgbackrest --stanza=db archive-get %f "%p"
shared_buffers: 4GB
shared_preload_libraries: pgaudit,pg_stat_statements,pgnodemx,pgaudit,pg_stat_statements
ssl: 'on'
ssl_ca_file: /pgconf/tls/ca.crt
ssl_cert_file: /pgconf/tls/tls.crt
ssl_key_file: /pgconf/tls/tls.key
tcp_keepalives_count: 9
tcp_keepalives_idle: 7200
tcp_keepalives_interval: 75
track_activity_query_size: 2048
unix_socket_directories: /tmp/postgres
vacuum_freeze_min_age: 1000000000
vacuum_freeze_table_age: 1000000000
vacuum_multixact_freeze_min_age: 5000000
vacuum_multixact_freeze_table_age: 150000000
wal_buffers: 16MB
wal_keep_segments: 50
wal_level: logical
work_mem: 524kB
pg_hba:
- local all "postgres" peer
- hostssl replication "_crunchyrepl" all cert
- hostssl "postgres" "_crunchyrepl" all cert
- host all "_crunchyrepl" all reject
- host all "ccp_monitoring" "127.0.0.0/8" scram-sha-256
- host all "ccp_monitoring" "::1/128" scram-sha-256
- host all "ccp_monitoring" all reject
- hostssl all "_crunchypgbouncer" all scram-sha-256
- host all "_crunchypgbouncer" all reject
- hostssl all all all md5
use_pg_rewind: true
use_slots: false
ttl: 30
Patroni log files
2024-01-05 10:34:04,557 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:10,222 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:10,223 WARNING: postgresql parameter wal_level=minimal failed validation, defaulting to hot_standby
2024-01-05 10:34:10,228 INFO: Changed huge_pages from off to False (restart might be required)
2024-01-05 10:34:10,228 INFO: Changed tcp_keepalives_count from 0 to 9
2024-01-05 10:34:10,228 INFO: Changed tcp_keepalives_idle from 0 to 7200
2024-01-05 10:34:10,228 INFO: Changed tcp_keepalives_interval from 0 to 75
2024-01-05 10:34:10,228 INFO: Changed wal_level from logical to replica (restart might be required)
2024-01-05 10:34:10,709 INFO: Reloading PostgreSQL configuration.
server signaled
2024-01-05 10:34:20,229 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:20,830 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:20,834 INFO: Changed huge_pages from off to False (restart might be required)
2024-01-05 10:34:20,834 INFO: Changed tcp_keepalives_count from 0 to 9
2024-01-05 10:34:20,834 INFO: Changed tcp_keepalives_idle from 0 to 7200
2024-01-05 10:34:20,834 INFO: Changed tcp_keepalives_interval from 0 to 75
2024-01-05 10:34:20,834 WARNING: Removing invalid parameter `hugepages-2Mi` from postgresql.parameters
2024-01-05 10:34:21,409 INFO: Reloading PostgreSQL configuration.
server signaled
2024-01-05 10:34:21,442 INFO: Lock owner: flowone-cluster-flowone-db-instanceset-tv2m-0; I am flowone-cluster-flowone-db-instanceset-tv2m-0
2024-01-05 10:34:21,888 INFO: closed patroni connection to the postgresql cluster
2024-01-05 10:34:22,413 INFO: establishing a new patroni connection to the postgres cluster
2024-01-05 10:34:22.788 UTC [106976] LOG: pgaudit extension initialized
2024-01-05 10:34:22,788 INFO: postmaster pid=106976
/tmp/postgres:5432 - no response
2024-01-05 10:34:22,815 INFO: establishing a new patroni connection to the postgres cluster
2024-01-05 10:34:22.877 UTC [106976] LOG: redirecting log output to logging collector process
2024-01-05 10:34:22.877 UTC [106976] HINT: Future log output will appear in directory "log".
2024-01-05 10:34:23,185 INFO: establishing a new patroni connection to the postgres cluster
/tmp/postgres:5432 - accepting connections
/tmp/postgres:5432 - accepting connections
2024-01-05 10:34:23,954 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:25,162 INFO: not proceeding with the restart: pending restart flag is not set
2024-01-05 10:34:33,948 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:43,945 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:53,948 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:35:03,948 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
PostgreSQL log files
2024-01-05 10:54:51.402 UTC [113769] FATAL: the database system is shutting down
2024-01-05 10:54:51.412 UTC [113770] FATAL: the database system is shutting down
2024-01-05 10:54:51.437 UTC [113765] FATAL: the database system is shutting down
2024-01-05 10:54:51.535 UTC [107604] LOG: database system is shut down
2024-01-05 10:54:52.570 UTC [113835] LOG: starting PostgreSQL 14.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20), 64-bit
2024-01-05 10:54:52.571 UTC [113835] LOG: listening on IPv4 address "0.0.0.0", port 5432
2024-01-05 10:54:52.571 UTC [113835] LOG: listening on IPv6 address "::", port 5432
2024-01-05 10:54:52.577 UTC [113835] LOG: listening on Unix socket "/tmp/postgres/.s.PGSQL.5432"
2024-01-05 10:54:52.615 UTC [113847] LOG: database system was shut down at 2024-01-05 10:54:51 UTC
2024-01-05 10:54:52.652 UTC [113835] LOG: database system is ready to accept connections
2024-01-05 10:54:55.740 UTC [113884] LOG: could not receive data from client: Connection reset by peer
2024-01-05 10:54:55.740 UTC [113880] LOG: could not receive data from client: Connection reset by peer
2024-01-05 10:54:55.740 UTC [113878] LOG: could not receive data from client: Connection reset by peer
2024-01-05 10:54:58.984 UTC [113900] LOG: could not receive data from client: Connection reset by peer
Have you tried to use GitHub issue search?
- Yes
Anything else we need to know?
No response
I bet it's a Crunchy operator that reverts it. Patroni never changes the global config on it's own.