patronictl -c /etc/patroni/~postgres-operator_cluster.yaml restart cluster is reverting the log_level configuration

Question

patronictl -c /etc/patroni/~postgres-operator_cluster.yaml restart cluster is reverting the log_level configuration

desorenp opened this issue 5 months ago · comments

What happened?

Hi Team,

I want to configure wal log level to minimal on OCP environment , where i have Cruchypostgres operator 5.5.0 installed with PG14.
The cluster is configured with primary and secondary replicas.

As per my observation, whatever log_level i try to set to, when the cluster is created , it comes up with log_level: logical
I tried to edit-config in the primary instanceset , which reverts back to logical after cluster restart.

I tried to configure log_level to replica , which also does not seems to work, it reverts back to logical.

Below is a snippet from show-config

patronictl -c /etc/patroni/~postgres-operator_cluster.yaml show-config
    wal_keep_segments: 50
    wal_level: logical
    work_mem: 524kB

patronictl -c /etc/patroni/~postgres-operator_cluster.yaml restart cluster-ha

I see below logs in primary instance logs

**```
2024-01-05 10:34:10,222 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:10,223 WARNING: postgresql parameter wal_level=minimal failed validation, defaulting to hot_standby
2024-01-05 10:34:10,228 INFO: Changed huge_pages from off to False (restart might be required)
2024-01-05 10:34:10,228 INFO: Changed tcp_keepalives_count from 0 to 9
2024-01-05 10:34:10,228 INFO: Changed tcp_keepalives_idle from 0 to 7200
2024-01-05 10:34:10,228 INFO: Changed tcp_keepalives_interval from 0 to 75
2024-01-05 10:34:10,228 INFO: Changed wal_level from logical to replica (restart might be required)
2024-01-05 10:34:10,709 INFO: Reloading PostgreSQL configuration.
server signaled
2024-01-05 10:34:20,229 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:20,830 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:20,834 INFO: Changed huge_pages from off to False (restart might be required)
2024-01-05 10:34:20,834 INFO: Changed tcp_keepalives_count from 0 to 9
2024-01-05 10:34:20,834 INFO: Changed tcp_keepalives_idle from 0 to 7200
2024-01-05 10:34:20,834 INFO: Changed tcp_keepalives_interval from 0 to 75
2024-01-05 10:34:20,834 WARNING: Removing invalid parameter `hugepages-2Mi` from postgresql.parameters
2024-01-05 10:34:21,409 INFO: Reloading PostgreSQL configuration.
server signaled
2024-01-05 10:34:21,442 INFO: Lock owner: flowone-cluster-flowone-db-instanceset-tv2m-0; I am flowone-cluster-flowone-db-instanceset-tv2m-0
2024-01-05 10:34:21,888 INFO: closed patroni connection to the postgresql cluster
2024-01-05 10:34:22,413 INFO: establishing a new patroni connection to the postgres cluster
2024-01-05 10:34:22.788 UTC [106976] LOG:  pgaudit extension initialized
2024-01-05 10:34:22,788 INFO: postmaster pid=106976
/tmp/postgres:5432 - no response
2024-01-05 10:34:22,815 INFO: establishing a new patroni connection to the postgres cluster
2024-01-05 10:34:22.877 UTC [106976] LOG:  redirecting log output to logging collector process
2024-01-05 10:34:22.877 UTC [106976] HINT:  Future log output will appear in directory "log".
2024-01-05 10:34:23,185 INFO: establishing a new patroni connection to the postgres cluster
/tmp/postgres:5432 - accepting connections
/tmp/postgres:5432 - accepting connections
2024-01-05 10:34:23,954 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:25,162 INFO: not proceeding with the restart: pending restart flag is not set
2024-01-05 10:34:33,948 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:43,945 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
```**

Can you please suggest the correct procedure to disable log_level as we are seeing an increase in pg_wal volumes .

How can we reproduce it (as minimally and precisely as possible)?

Create a DB cluster with primary and secondary replica with log_level: logical
Try to update the config to replica or minimal.
Restart cluster .

What did you expect to happen?

I

Patroni/PostgreSQL/DCS version

Patroni version: 5.5.0
PostgreSQL version: 14.6
DCS (and its version):

Patroni configuration file

---
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: cluster
  labels:
    app: postgresql
    version: "14"
  namespace: fo-ns
spec:
  port: 5432
  postgresVersion: 14
  openshift: true
  monitoring:
    pgmonitor:
      exporter:
        image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.3.1-0
  backups:
    pgbackrest:
      global:
        repo1-retention-archive: "1"
        repo1-retention-archive-type: full
        repo1-retention-full: "1"
        repo1-retention-full-type: count
      repos:
        - name: repo1
          schedules:
            full: "* */6 * * *"
          volume:
            volumeClaimSpec:
              accessModes:
                - ReadWriteMany
              resources:
                requests:
                  storage: 200Gi
              storageClassName: ocs-storagecluster-cephfs
  users:
    - name: postgres
    - databases:
        - ilinkdb
      name: ilink
    - databases:
        - wfcdb
      name: wfc
    - databases:
        - batchapidb
      name: batchapi
    - databases:
        - catalogdb
      name: catalog
    - databases:
        - eventmanagementdb
      name: eventmanagement
    - databases:
        - archivedb
      name: archive
  proxy:
    pgBouncer:
      replicas: 1
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 1
              podAffinityTerm:
                topologyKey: kubernetes.io/hostname
                labelSelector:
                  matchLabels:
                    postgres-operator.crunchydata.com/cluster: cluster
                    postgres-operator.crunchydata.com/role: pgbouncer
  instances:
    - name: fl-db-instanceset
      replicas: 2
      resources:
        limits:
          cpu: '6'
          memory: 12Gi
        requests:
          cpu: '4'
          memory: 10Gi
      dataVolumeClaimSpec:
        storageClassName: ocs-storagecluster-cephfs
        accessModes:
          - ReadWriteMany
        resources:
          requests:
            storage: 100Gi
      walVolumeClaimSpec:
        storageClassName: ocs-storagecluster-cephfs
        accessModes:
          - ReadWriteMany
        resources:
          requests:
            storage: 100Gi
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 1
              podAffinityTerm:
                topologyKey: kubernetes.io/hostname
                labelSelector:
                  matchLabels:
                    postgres-operator.crunchydata.com/cluster: cluster
                    postgres-operator.crunchydata.com/instance-set: instanceset
  patroni:
    dynamicConfiguration:
      postgresql:
        parameters:
          TimeZone: UTC
          max_connections: 2000
          tcp_keepalives_idle: 7200
          tcp_keepalives_count: 9
          tcp_keepalives_interval: 75
          idle_in_transaction_session_timeout: 30min
          shared_preload_libraries: pgaudit,pg_stat_statements
          track_activity_query_size: 2048
          pg_stat_statements.track: all
          huge_pages: off
          hugepages-2Mi: 0
          shared_buffers: 4GB
          effective_cache_size: 12GB
          maintenance_work_mem: 1GB
          checkpoint_completion_target: 0.9
          wal_buffers: 16MB
          default_statistics_target: 100
          random_page_cost: 1.1
          effective_io_concurrency: 200
          work_mem: 524kB
          min_wal_size: 2GB
          max_wal_size: 8GB
          max_worker_processes: 10
          max_parallel_workers_per_gather: 4
          max_parallel_workers: 10
          max_parallel_maintenance_workers: 4
          log_autovacuum_min_duration: 1min
          autovacuum_freeze_max_age: 1000000000
          autovacuum_multixact_freeze_max_age: 1000000000
          autovacuum_max_workers: 10
          autovacuum_naptime: 5s
          autovacuum_vacuum_cost_delay: 10ms
          autovacuum_vacuum_cost_limit: 1000
          vacuum_freeze_min_age: 1000000000
          vacuum_freeze_table_age: 1000000000
          wal_level: minimal
          wal_keep_segments: 50
          vacuum_multixact_freeze_min_age: 5000000
          vacuum_multixact_freeze_table_age: 150000000

patronictl show-config

sh-4.4$ patronictl -c /etc/patroni/~postgres-operator_cluster.yaml show-config
loop_wait: 10
postgresql:
  parameters:
    TimeZone: UTC
    archive_command: pgbackrest --stanza=db archive-push "%p"
    archive_mode: 'on'
    archive_timeout: 60s
    autovacuum_freeze_max_age: 1000000000
    autovacuum_max_workers: 10
    autovacuum_multixact_freeze_max_age: 1000000000
    autovacuum_naptime: 5s
    autovacuum_vacuum_cost_delay: 10ms
    autovacuum_vacuum_cost_limit: 1000
    checkpoint_completion_target: 0.9
    default_statistics_target: 100
    effective_cache_size: 12GB
    effective_io_concurrency: 200
    huge_pages: false
    hugepages-2Mi: 0
    idle_in_transaction_session_timeout: 30min
    jit: 'off'
    log_autovacuum_min_duration: 1min
    maintenance_work_mem: 1GB
    max_connections: 2000
    max_parallel_maintenance_workers: 4
    max_parallel_workers: 10
    max_parallel_workers_per_gather: 4
    max_wal_size: 8GB
    max_worker_processes: 10
    min_wal_size: 2GB
    password_encryption: scram-sha-256
    pg_stat_statements.track: all
    pgnodemx.kdapi_path: /etc/database-containerinfo
    random_page_cost: 1.1
    restore_command: pgbackrest --stanza=db archive-get %f "%p"
    shared_buffers: 4GB
    shared_preload_libraries: pgaudit,pg_stat_statements,pgnodemx,pgaudit,pg_stat_statements
    ssl: 'on'
    ssl_ca_file: /pgconf/tls/ca.crt
    ssl_cert_file: /pgconf/tls/tls.crt
    ssl_key_file: /pgconf/tls/tls.key
    tcp_keepalives_count: 9
    tcp_keepalives_idle: 7200
    tcp_keepalives_interval: 75
    track_activity_query_size: 2048
    unix_socket_directories: /tmp/postgres
    vacuum_freeze_min_age: 1000000000
    vacuum_freeze_table_age: 1000000000
    vacuum_multixact_freeze_min_age: 5000000
    vacuum_multixact_freeze_table_age: 150000000
    wal_buffers: 16MB
    wal_keep_segments: 50
    wal_level: logical
    work_mem: 524kB
  pg_hba:
  - local all "postgres" peer
  - hostssl replication "_crunchyrepl" all cert
  - hostssl "postgres" "_crunchyrepl" all cert
  - host all "_crunchyrepl" all reject
  - host all "ccp_monitoring" "127.0.0.0/8" scram-sha-256
  - host all "ccp_monitoring" "::1/128" scram-sha-256
  - host all "ccp_monitoring" all reject
  - hostssl all "_crunchypgbouncer" all scram-sha-256
  - host all "_crunchypgbouncer" all reject
  - hostssl all all all md5
  use_pg_rewind: true
  use_slots: false
ttl: 30

Patroni log files

2024-01-05 10:34:04,557 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:10,222 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:10,223 WARNING: postgresql parameter wal_level=minimal failed validation, defaulting to hot_standby
2024-01-05 10:34:10,228 INFO: Changed huge_pages from off to False (restart might be required)
2024-01-05 10:34:10,228 INFO: Changed tcp_keepalives_count from 0 to 9
2024-01-05 10:34:10,228 INFO: Changed tcp_keepalives_idle from 0 to 7200
2024-01-05 10:34:10,228 INFO: Changed tcp_keepalives_interval from 0 to 75
2024-01-05 10:34:10,228 INFO: Changed wal_level from logical to replica (restart might be required)
2024-01-05 10:34:10,709 INFO: Reloading PostgreSQL configuration.
server signaled
2024-01-05 10:34:20,229 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:20,830 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:20,834 INFO: Changed huge_pages from off to False (restart might be required)
2024-01-05 10:34:20,834 INFO: Changed tcp_keepalives_count from 0 to 9
2024-01-05 10:34:20,834 INFO: Changed tcp_keepalives_idle from 0 to 7200
2024-01-05 10:34:20,834 INFO: Changed tcp_keepalives_interval from 0 to 75
2024-01-05 10:34:20,834 WARNING: Removing invalid parameter `hugepages-2Mi` from postgresql.parameters
2024-01-05 10:34:21,409 INFO: Reloading PostgreSQL configuration.
server signaled
2024-01-05 10:34:21,442 INFO: Lock owner: flowone-cluster-flowone-db-instanceset-tv2m-0; I am flowone-cluster-flowone-db-instanceset-tv2m-0
2024-01-05 10:34:21,888 INFO: closed patroni connection to the postgresql cluster
2024-01-05 10:34:22,413 INFO: establishing a new patroni connection to the postgres cluster
2024-01-05 10:34:22.788 UTC [106976] LOG:  pgaudit extension initialized
2024-01-05 10:34:22,788 INFO: postmaster pid=106976
/tmp/postgres:5432 - no response
2024-01-05 10:34:22,815 INFO: establishing a new patroni connection to the postgres cluster
2024-01-05 10:34:22.877 UTC [106976] LOG:  redirecting log output to logging collector process
2024-01-05 10:34:22.877 UTC [106976] HINT:  Future log output will appear in directory "log".
2024-01-05 10:34:23,185 INFO: establishing a new patroni connection to the postgres cluster
/tmp/postgres:5432 - accepting connections
/tmp/postgres:5432 - accepting connections
2024-01-05 10:34:23,954 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:25,162 INFO: not proceeding with the restart: pending restart flag is not set
2024-01-05 10:34:33,948 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:43,945 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:34:53,948 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock
2024-01-05 10:35:03,948 INFO: no action. I am (flowone-cluster-flowone-db-instanceset-tv2m-0), the leader with the lock

PostgreSQL log files

2024-01-05 10:54:51.402 UTC [113769] FATAL:  the database system is shutting down
2024-01-05 10:54:51.412 UTC [113770] FATAL:  the database system is shutting down
2024-01-05 10:54:51.437 UTC [113765] FATAL:  the database system is shutting down
2024-01-05 10:54:51.535 UTC [107604] LOG:  database system is shut down
2024-01-05 10:54:52.570 UTC [113835] LOG:  starting PostgreSQL 14.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20), 64-bit
2024-01-05 10:54:52.571 UTC [113835] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-01-05 10:54:52.571 UTC [113835] LOG:  listening on IPv6 address "::", port 5432
2024-01-05 10:54:52.577 UTC [113835] LOG:  listening on Unix socket "/tmp/postgres/.s.PGSQL.5432"
2024-01-05 10:54:52.615 UTC [113847] LOG:  database system was shut down at 2024-01-05 10:54:51 UTC
2024-01-05 10:54:52.652 UTC [113835] LOG:  database system is ready to accept connections
2024-01-05 10:54:55.740 UTC [113884] LOG:  could not receive data from client: Connection reset by peer
2024-01-05 10:54:55.740 UTC [113880] LOG:  could not receive data from client: Connection reset by peer
2024-01-05 10:54:55.740 UTC [113878] LOG:  could not receive data from client: Connection reset by peer
2024-01-05 10:54:58.984 UTC [113900] LOG:  could not receive data from client: Connection reset by peer

Have you tried to use GitHub issue search?

Yes

Anything else we need to know?

No response

Alexander Kukushkin · Answer 1 · Fri Jan 05 2024 18:58:57 GMT+0800 (China Standard Time)

I bet it's a Crunchy operator that reverts it. Patroni never changes the global config on it's own.