postgres glibc issue post migration from CentOS -> Rocky
avmanojkumar1201 opened this issue · comments
What happened?
while migration of OS from Cent OS to Rocky , we do see indexes are becoming invalid & we need to rebuild those indexes to make it as valid and healthy.
Source OS : CentOS Linux release 7.9.2009
Target OS : Rocky Linux release 8.4
Postgres Version : 14.4
How can we reproduce it (as minimally and precisely as possible)?
Migrate the OS from CentOS->Rocky
What did you expect to happen?
how can we plan to avoid the index unhealthy/with out rebuild option
Patroni/PostgreSQL/DCS version
- Patroni version: 2.1.3
- PostgreSQL version: 14.4
- DCS (and its version): etcd 3.5.0
Patroni configuration file
scope: xxxx
name: xxxx
namespace: /service/
restapi:
listen: xxxx:xxxx
connect_address: xxxx:xxxx
log:
level: DEBUG
dir: /data1/pglogs/patroni
file_num: 4
file_size: 262144000 #250 MB
etcd:
hosts: xxxx:xxxx,xxxx:xxxx
bootstrap:
method: initdb
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 5242880
master_start_timeout: 300
synchronous_mode: false
synchronous_mode_strict: false
synchronous_node_count: 1
# standby_cluster:
# host: 127.0.0.1
# port: 1111
# primary_slot_name: patroni
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
max_connections: 200
max_locks_per_transaction: 64
max_worker_processes: 8
max_prepared_transactions: 0
wal_level: replica
wal_log_hints: on
track_commit_timestamp: off
hot_standby: on
archive_mode: on
initdb: # List options to be passed on to initdb
- encoding: UTF8
- locale: en_US.UTF-8
- data-checksums
pg_hba: # Add following lines to pg_hba.conf after running 'initdb'
#- host replication replicator 127.0.0.1/32 md5
#- host all all 0.0.0.0/0 md5
- hostssl replication replicator all md5
- hostssl all postgres localhost md5
- hostssl all postgres all md5
- local all postgres peer
- local all all md5
- hostssl all all all md5
postgresql:
listen: xxxx:xxxx
connect_address: xxxx:xxxx
use_unix_socket: true
data_dir: /data1/pgdata/data
bin_dir: /usr/pgsql-14/bin
# config_dir: /data1/pgdata/config
pgpass: xxxx
authentication:
replication:
username: xxxx
password: xxxx
superuser:
username: xxxx
password: xxxx
# rewind: # Has no effect on postgres 10 and lower
# username: xxxx
# password: xxxx
parameters:
unix_socket_directories: xxxx
stats_temp_directory: /data1/pgdata/tmp
superuser_reserved_connections: 5
huge_pages: try
shared_buffers: 3072MB
work_mem: 512MB
maintenance_work_mem: 768MB
effective_cache_size: 9126MB
checkpoint_timeout: 5min
checkpoint_completion_target: 0.9
min_wal_size: 80MB
max_wal_size: 1GB
wal_buffers: 16MB
default_statistics_target: 100
seq_page_cost: 1
random_page_cost: 4
effective_io_concurrency: 2
synchronous_commit: on
autovacuum: on
autovacuum_max_workers: 5
autovacuum_vacuum_scale_factor: 0.01
autovacuum_analyze_scale_factor: 0.02
autovacuum_vacuum_cost_limit: 200
autovacuum_vacuum_cost_delay: 20
autovacuum_naptime: 1s
max_files_per_process: 4096
archive_timeout: 1800s
archive_command: pgbackrest --stanza=xxxx archive-push %p
#archive_command: /bin/true
wal_keep_size: 10240MB
wal_keep_segments: 64
max_wal_senders: 10
max_replication_slots: 10
shared_preload_libraries: pg_stat_statements
cron.host: ''
cron.database_name: 'postgres'
pg_stat_statements.max: 10000
pg_stat_statements.track: all
pg_stat_statements.save: off
auto_explain.log_min_duration: 10s
auto_explain.log_analyze: true
auto_explain.log_buffers: true
auto_explain.log_timing: false
auto_explain.log_triggers: true
auto_explain.log_verbose: true
auto_explain.log_nested_statements: true
track_io_timing: on
log_lock_waits: on
log_temp_files: 0
track_activities: on
track_counts: on
track_functions: all
log_checkpoints: on
logging_collector: on
log_truncate_on_rotation: on
log_rotation_age: 1d
log_rotation_size: 1GB
log_line_prefix: '%t [%p-%l] %r %q%u@%d '
log_filename: 'postgresql-%a.log'
log_directory: /data1/pglogs/postgres
log_connections: on
log_disconnections: on
log_statements: on
log_file_mode: 0644
ssl: 'on'
ssl_ca_file: xxxx
ssl_cert_file: xxxx
ssl_key_file: xxxx
remove_data_directory_on_rewind_failure: false
remove_data_directory_on_diverged_timelines: false
# callbacks:
# on_start:
# on_stop:
# on_restart:
# on_reload:
# on_role_change:
create_replica_methods:
- basebackup
basebackup:
max-rate: '1024M'
checkpoint: 'fast'
watchdog:
mode: off # Allowed values: off, automatic, required
device: /dev/watchdog
safety_margin: 5
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
# specify a node to replicate from (cascading replication)
# replicatefrom: (node name)
patronictl show-config
loop_wait: 10
master_start_timeout: 300
maximum_lag_on_failover: 5242880
postgresql:
parameters:
archive_mode: true
hot_standby: true
max_connections: 200
max_locks_per_transaction: 64
max_prepared_transactions: 0
max_worker_processes: 8
track_commit_timestamp: false
wal_level: replica
wal_log_hints: true
use_pg_rewind: true
use_slots: true
retry_timeout: 10
synchronous_mode: false
synchronous_mode_strict: false
synchronous_node_count: 1
ttl: 30
Patroni log files
NA
PostgreSQL log files
WARNING: Failed to check index xxxx.xxxx: item xxxx invariant violated for index "xxxx"
WARNING: Failed to check index xxxx.xxxx: item xxxx invariant violated for index "xxxx"
Have you tried to use GitHub issue search?
- Yes
Anything else we need to know?
as per the exploration , this issue is exist in glibc version which are upgraded more than 2.17+.
Do we see any performance issue if we keep glibc 2.17 version & how to keep the same glibc version even when OS migration is happening.
would you be able to provide the detailed steps how to keep same glibc version & any performance issues do get if keep the same version in Target Machine which is OS Migration.
- How to keep the same glibc version though it's glibc version change is legit as part of OS migrations(2.17 i would like to keep it in Cent OS & Rocky). Please provide the details steps.
- if we enforce the same glibc version as part of OS migration, do we see any performance issues araise for Postgres/OS bottlenecks.
- How can we avoid index rebuild though it;s concurrently option, it can cause outage for critical Applications as index is becoming invalid.
How do you see this related to "Patroni bugs" (and Patroni at all), which the issues here exist for?
P.S. There are quite some articles/conference talks about the approaches used to solve your problem. You may also want to check the approach used in Spilo.
@avmanojkumar1201 please make yourself familiar with https://wiki.postgresql.org/wiki/Locale_data_changes