FATAL: could not connect to the primary server: connection to server at "x.x.x.x", port 5432 failed: session is read-only
novanbramantya opened this issue · comments
What happened?
When i setup the new patroni cluster for rollback needed, i want to replicate it from standby patroni cluster which is not promoted yet. after basebackup section somehow it not continuing the replication process and got fatal said session is read-only
How can we reproduce it (as minimally and precisely as possible)?
i have existing patroni postgres cluster in 1 Cloud provider, i need to migrate it to another cloud provider and the state is done replication and set to standby leader. after that i need to setup the rollback patroni cluster. But when i start the patroni service somehow got this error
FATAL: could not connect to the primary server: connection to server at "x.x.x.x", port 5432 failed: session is read-only
most likely patroni doesn't allow replication from replica instead of leader
What did you expect to happen?
i need to replicate the data from standby patroni cluster to rollback cluster which is still in read-only session until it got promoted
Patroni/PostgreSQL/DCS version
- Patroni version: patroni 3.2.2
- PostgreSQL version: 12.18
- DCS (and its version): 3.4
Patroni configuration file
namespace: patroni
scope: test_rollback_postgres
name: test-rollback-postgres-n1
etcd3:
url: https://test-patroni-etcd.service.consul:2379
cert: /etc/etcd/etcd.pem
key: /etc/etcd/etcd-key.pem
cacert: /etc/etcd/root-ca.pem
postgresql:
bin_dir: /usr/lib/postgresql/12/bin
use_unix_socket: true
listen: 0.0.0.0:5432
config_dir: /etc/postgresql/patroni
data_dir: /data/postgres/data/
pgpass: /var/lib/postgresql/.pgpass
pg_ctl_timeout: 60
connect_address: x.x.x.x:5432
authentication:
superuser:
username: admin_user
password: admin_pass
replication:
username: repl_user
password: repl_pass
rewind:
username: repl_user
password: repl_pass
basebackup:
- progress
- slot: "rollback_master"
- verbose
parameters:
work_mem: 8MB
archive_timeout: 60
archive_command: "/usr/local/bin/wal-g-push-wal.sh %p"
checkpoint_completion_target: 0.7
checkpoint_timeout: 15min
hot_standby_feedback: on
log_checkpoints: on
log_destination: 'stderr'
log_directory: '/var/log/postgresql'
log_file_mode: 0600
log_filename: 'postgresql-patroni.log'
log_line_prefix: '%t [%p] %q%u@%d %h '
log_rotation_age: 0
log_rotation_size: 0
log_min_duration_statement: 100
maintenance_work_mem: 512MB
log_statement: 'ddl'
log_timezone: 'Asia/Jakarta'
timezone: 'Asia/Jakarta'
max_wal_size: 1GB
min_wal_size: 80MB
ssl: off
pg_hba:
bootstrap:
dcs:
standby_cluster:
host: x.x.x.x
port: 5432
primary_slot_name: rollback_master
create_replica_methods:
- basebackup
- slot: "rollback_master"
- progress
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 50000000
master_start_timeout: 300
synchronous_mode: false
synchronous_mode_strict: false
check_timeline: false
ignore_slots:
- type: logical
plugin: wal2json
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
max_connections: 1024
max_locks_per_transaction: 64
max_worker_processes: 8
max_prepared_transactions: 0
wal_level: logical
wal_log_hints: on
track_commit_timestamp: off
shared_preload_libraries: pg_stat_statements,pglogical,wal2json,pg_partman_bgw
archive_mode: on
shared_buffers: 1GB
pg_stat_statements.track: all
hot_standby: on
logging_collector: on
log_truncate_on_rotation: off
log_lock_waits: on
wal_keep_segments: 100
max_wal_senders: 15
max_replication_slots: 20
pg_partman_bgw.interval: 3600
pg_partman_bgw.dbname: 'testdb'
initdb:
- encoding: UTF8
- data-checksums
users:
admin_user:
password: admin_pass
repl_user:
password: repl_pass
options:
- replication
watchdog:
mode: automatic
device: /dev/watchdog
safety_margin: 5
log:
level: INFO
traceback_level: ERROR
restapi:
listen: 0.0.0.0:8008
connect_address: x.x.x.x:8008
tags:
nofailover: false
noloadbalance: false
patronictl show-config
check_timeline: false
ignore_slots:
- plugin: wal2json
type: logical
loop_wait: 10
master_start_timeout: 300
maximum_lag_on_failover: 50000000
postgresql:
parameters:
archive_mode: true
hot_standby: true
log_lock_waits: true
log_truncate_on_rotation: false
logging_collector: true
max_connections: 1024
max_locks_per_transaction: 64
max_prepared_transactions: 0
max_replication_slots: 20
max_wal_senders: 15
max_worker_processes: 8
pg_partman_bgw.dbname: testdb
pg_partman_bgw.interval: 3600
pg_stat_statements.track: all
shared_buffers: 1GB
shared_preload_libraries: pg_stat_statements,pglogical,wal2json,pg_partman_bgw
track_commit_timestamp: false
wal_keep_segments: 100
wal_level: logical
wal_log_hints: true
use_pg_rewind: true
use_slots: true
retry_timeout: 10
standby_cluster:
create_replica_methods:
- basebackup
- slot: rollback_master
- progress
host: x.x.x.x
port: 5432
primary_slot_name: rollback_master
synchronous_mode: false
synchronous_mode_strict: false
ttl: 30
Patroni log files
: self.handle_one_request()
: File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 1338, in handle_one_request
: BaseHTTPRequestHandler.handle_one_request(self)
: File "/usr/lib/python3.10/http/server.py", line 421, in handle_one_request
: method()
: File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 446, in do_GET_patroni
: self._write_status_response(200, response)
: File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 218, in _write_status_response
: self._write_json_response(status_code, response)
: File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 167, in _write_json_response
: self.write_response(status_code, json.dumps(response, default=str), content_type='application/json')
: File "/usr/local/lib/python3.10/dist-packages/patroni/api.py", line 157, in write_response
: self.wfile.write(body.encode('utf-8'))
: File "/usr/lib/python3.10/socketserver.py", line 826, in write
: self._sock.sendall(b)
: BrokenPipeError: [Errno 32] Broken pipe
PostgreSQL log files
2024-03-28 12:25:14 WIB [23648] LOG: database system is ready to accept read only connections
2024-03-28 12:25:15 WIB [23686] FATAL: could not connect to the primary server: connection to server at "x.x.x.x", port 5444 failed: session is read-only
2024-03-28 12:25:16 WIB [23747] FATAL: could not connect to the primary server: connection to server at "x.x.x.x", port 5444 failed: session is read-only
Have you tried to use GitHub issue search?
- Yes
Anything else we need to know?
No response
most likely patroni doesn't allow replication from replica instead of leader
To be precise, standby-leader wants to replicate from the primary. Yes.
i need to replicate the data from standby patroni cluster to rollback cluster which is still in read-only session until it got promoted
There is no real need to do it. What you should better do - gracefully convert cluster in the DC1 to standby as it is described here: #1660 (comment)
In fact, you can immediately put host
and port
to standby_cluster
section in DC1, so that after promoting standby cluster in DC2 the cluster from DC1 will start replicating.
Thanks for the explanation! really appreciate it
But in my use case, the service has very big transaction. if i immediately create fail forward replication from DC2 to DC1, it maybe works but maybe not as well, hard to guarantee it.
So instead of doing this, we create separate rollback cluster in DC1 separate from existing cluster in DC1
so actually we do have migration project between DC 1 and DC2
so the topology is
existing patroni cluster DC1 (leader) -> standby patroni cluster DC2 (standby leader / replica from existing leader DC1) -> rollback patroni cluster DC1 (standby leader / replica from standby leader DC2)
those 2 DC has separate dcs.
with above topology, if anything goes wrong in DC2, we can immediately promote rollback cluster in DC1 and we don't need to worries about the transaction WAL because at the beginning we do have replication slot between DC2 and rollback cluster DC1.
Thanks
hi @CyberDem0n sorry to tag you again, about this PR, is there any information about when will it get the approval? Thanks!