postgres glibc issue post migration from CentOS -> Rocky

Question

postgres glibc issue post migration from CentOS -> Rocky

avmanojkumar1201 opened this issue 3 months ago · comments

What happened?

while migration of OS from Cent OS to Rocky , we do see indexes are becoming invalid & we need to rebuild those indexes to make it as valid and healthy.

Source OS : CentOS Linux release 7.9.2009
Target OS : Rocky Linux release 8.4
Postgres Version : 14.4

How can we reproduce it (as minimally and precisely as possible)?

Migrate the OS from CentOS->Rocky

What did you expect to happen?

how can we plan to avoid the index unhealthy/with out rebuild option

Patroni/PostgreSQL/DCS version

Patroni version: 2.1.3
PostgreSQL version: 14.4
DCS (and its version): etcd 3.5.0

Patroni configuration file

scope: xxxx
name: xxxx
namespace: /service/

restapi:

  listen: xxxx:xxxx

  connect_address: xxxx:xxxx
log:
  level: DEBUG
  dir: /data1/pglogs/patroni
  file_num: 4
  file_size: 262144000 #250 MB
etcd:
  hosts: xxxx:xxxx,xxxx:xxxx
bootstrap:
  method: initdb
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 5242880
    master_start_timeout: 300
    synchronous_mode: false
    synchronous_mode_strict: false
    synchronous_node_count: 1
    # standby_cluster:
      # host: 127.0.0.1
      # port: 1111
      # primary_slot_name: patroni
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        max_connections: 200
        max_locks_per_transaction: 64
        max_worker_processes: 8
        max_prepared_transactions: 0
        wal_level: replica
        wal_log_hints: on
        track_commit_timestamp: off
        hot_standby: on
        archive_mode: on

  initdb:  # List options to be passed on to initdb
    - encoding: UTF8
    - locale: en_US.UTF-8
    - data-checksums


  pg_hba:  # Add following lines to pg_hba.conf after running 'initdb'
    #- host replication replicator 127.0.0.1/32 md5
    #- host all all 0.0.0.0/0 md5

    - hostssl replication     replicator all                md5
    - hostssl all     postgres localhost                md5
    - hostssl all     postgres all                md5
    - local   all             postgres                    peer
    - local   all             all                                   md5

    - hostssl all             all                all                md5



postgresql:

  listen: xxxx:xxxx

  connect_address: xxxx:xxxx
  use_unix_socket: true
  data_dir: /data1/pgdata/data
  bin_dir: /usr/pgsql-14/bin
# config_dir: /data1/pgdata/config
  pgpass: xxxx
  authentication:
    replication:
      username: xxxx
      password: xxxx
    superuser:
      username: xxxx
      password: xxxx
#    rewind:  # Has no effect on postgres 10 and lower
#      username: xxxx
#      password: xxxx
  parameters:
    unix_socket_directories: xxxx
    stats_temp_directory: /data1/pgdata/tmp
    superuser_reserved_connections: 5
    huge_pages: try
    shared_buffers: 3072MB
    work_mem: 512MB
    maintenance_work_mem: 768MB
    effective_cache_size: 9126MB
    checkpoint_timeout: 5min
    checkpoint_completion_target: 0.9
    min_wal_size: 80MB
    max_wal_size: 1GB
    wal_buffers: 16MB
    default_statistics_target: 100
    seq_page_cost: 1
    random_page_cost: 4
    effective_io_concurrency: 2
    synchronous_commit: on
    autovacuum: on
    autovacuum_max_workers: 5
    autovacuum_vacuum_scale_factor: 0.01
    autovacuum_analyze_scale_factor: 0.02
    autovacuum_vacuum_cost_limit: 200
    autovacuum_vacuum_cost_delay: 20
    autovacuum_naptime: 1s
    max_files_per_process: 4096
    archive_timeout: 1800s
    archive_command: pgbackrest --stanza=xxxx archive-push %p
    #archive_command: /bin/true
    wal_keep_size: 10240MB
    wal_keep_segments: 64
    max_wal_senders: 10
    max_replication_slots: 10
    shared_preload_libraries: pg_stat_statements
    cron.host: ''
    cron.database_name: 'postgres'
    pg_stat_statements.max: 10000
    pg_stat_statements.track: all
    pg_stat_statements.save: off
    auto_explain.log_min_duration: 10s
    auto_explain.log_analyze: true
    auto_explain.log_buffers: true
    auto_explain.log_timing: false
    auto_explain.log_triggers: true
    auto_explain.log_verbose: true
    auto_explain.log_nested_statements: true
    track_io_timing: on
    log_lock_waits: on
    log_temp_files: 0
    track_activities: on
    track_counts: on
    track_functions: all
    log_checkpoints: on
    logging_collector: on
    log_truncate_on_rotation: on
    log_rotation_age: 1d
    log_rotation_size: 1GB
    log_line_prefix: '%t [%p-%l] %r %q%u@%d '
    log_filename: 'postgresql-%a.log'
    log_directory: /data1/pglogs/postgres
    log_connections: on
    log_disconnections: on
    log_statements: on
    log_file_mode: 0644
    ssl: 'on'
    ssl_ca_file: xxxx
    ssl_cert_file: xxxx
    ssl_key_file: xxxx
  remove_data_directory_on_rewind_failure: false
  remove_data_directory_on_diverged_timelines: false

#  callbacks:
#    on_start:
#    on_stop:
#    on_restart:
#    on_reload:
#    on_role_change:
  create_replica_methods:
    - basebackup
  basebackup:
    max-rate: '1024M'
    checkpoint: 'fast'
watchdog:
  mode: off  # Allowed values: off, automatic, required
  device: /dev/watchdog
  safety_margin: 5

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false



  # specify a node to replicate from (cascading replication)
#  replicatefrom: (node name)

patronictl show-config

loop_wait: 10
master_start_timeout: 300
maximum_lag_on_failover: 5242880
postgresql:
  parameters:
    archive_mode: true
    hot_standby: true
    max_connections: 200
    max_locks_per_transaction: 64
    max_prepared_transactions: 0
    max_worker_processes: 8
    track_commit_timestamp: false
    wal_level: replica
    wal_log_hints: true
  use_pg_rewind: true
  use_slots: true
retry_timeout: 10
synchronous_mode: false
synchronous_mode_strict: false
synchronous_node_count: 1
ttl: 30

Patroni log files

NA

PostgreSQL log files

WARNING:  Failed to check index xxxx.xxxx: item xxxx invariant violated for index "xxxx"

WARNING:  Failed to check index xxxx.xxxx: item xxxx invariant violated for index "xxxx"

Have you tried to use GitHub issue search?

Yes

Anything else we need to know?

as per the exploration , this issue is exist in glibc version which are upgraded more than 2.17+.

Do we see any performance issue if we keep glibc 2.17 version & how to keep the same glibc version even when OS migration is happening.

would you be able to provide the detailed steps how to keep same glibc version & any performance issues do get if keep the same version in Target Machine which is OS Migration.

How to keep the same glibc version though it's glibc version change is legit as part of OS migrations(2.17 i would like to keep it in Cent OS & Rocky). Please provide the details steps.
if we enforce the same glibc version as part of OS migration, do we see any performance issues araise for Postgres/OS bottlenecks.
How can we avoid index rebuild though it;s concurrently option, it can cause outage for critical Applications as index is becoming invalid.

Polina Bungina · Answer 1 · Tue Feb 27 2024 03:02:36 GMT+0800 (China Standard Time)

How do you see this related to "Patroni bugs" (and Patroni at all), which the issues here exist for?

P.S. There are quite some articles/conference talks about the approaches used to solve your problem. You may also want to check the approach used in Spilo.

Alexander Kukushkin · Answer 2 · Wed Feb 28 2024 13:18:20 GMT+0800 (China Standard Time)

@avmanojkumar1201 please make yourself familiar with https://wiki.postgresql.org/wiki/Locale_data_changes