EnterpriseDB / repmgr

A lightweight replication manager for PostgreSQL (Postgres)

Home Page:https://repmgr.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

standby falsely promoted after cloning from PITR recovered primary

Bschitter opened this issue · comments

I have a database setup with 3 nodes:

  1. primary
  2. standby
  3. backup/repository/archive node.

PostgreSQL: 15.1
repmgr: 5.3.3
pgbackrest: 2.43

When I do a PITR with pgbackrest to the primary node, then pgbackrest will write the following to the postgresql.auto.conf file:

recovery_target_time = '2023-01-27 13:30:00.000000+01'
recovery_target_action = 'promote'

When I now want to clone my standby with repmgr standby clone from the primary and join it to the cluster, it will also copy the postgresql.auto.conf file. repmgr will add the primary_conninfo entry to postgresql.auto.conf, but does not delete old entries from it. And after starting the standby database, it will correctly start as standby since the standby.signal is present, but will then replay WALs from the archive and promote itself, since the recovery_target_action is still promote.

We didn't have this issue with PostgreSQL 11, repmgr 5.0.0 and pgbackrest 2.18, when there was still the recovery.conf file.

Am I doing something wrong?
Or is cloning the standby after a PITR not supported by repmgr?

Since it says not to edit the postgresql.auto.conf, I prefer not to edit or delete it.

I also found this somewhat related issue #543, but cannot tell what exactly was done about it.

I found out that I can use the ALTER SYSTEM command to reset configuration properties inside postgresql.auto.conf, see
https://www.postgresql.org/docs/15/sql-altersystem.html

So after a restore and before executing repmgr standby clone I will execute ALTER SYSTEM RESET ALL; to remove configuration properties inside postgresql.auto.conf inserted by pgbackrest.
This solves my issue.

But still I'm thinking that it would be nice if repmgr would show a warning, a hint or even refuse to clone, if something will be preventing the database from starting in standby mode properly.
I understand however that the documentation says:

Note that by default, all configuration files in the source node's data directory will be copied to the cloned node. Typically these will be postgresql.conf, postgresql.auto.conf, pg_hba.conf and pg_ident.conf. These may require modification before the standby is started.