EnterpriseDB / repmgr

A lightweight replication manager for PostgreSQL (Postgres)

Home Page:https://repmgr.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is this a bug on standby promotion?

jiangjcsnapon opened this issue · comments

We have two nodesand one witness configuration registered in repmgr 5.4dev.

I am testing split brain scenario by break connection between primary and standby, but witness can see primary.

The standby promoted itself.

"[2023-07-28 13:57:23] [INFO] checking state of node ""PrimaryServer"" (ID: 1), 6 of 6 attempts
[2023-07-28 13:57:25] [WARNING] unable to ping ""user=repmgr connect_timeout=2 dbname=repmgr host=PrimaryServer port=5432 fallback_application_name=repmgr""
[2023-07-28 13:57:25] [DETAIL] PQping() returned ""PQPING_NO_RESPONSE""
[2023-07-28 13:57:25] [WARNING] unable to reconnect to node ""PrimaryServer"" (ID: 1) after 6 attempts
[2023-07-28 13:57:25] [INFO] 1 active sibling nodes registered
[2023-07-28 13:57:25] [INFO] 3 total nodes registered
[2023-07-28 13:57:25] [INFO] primary node ""PrimaryServer"" (ID: 1) and this node have the same location (""default"")
[2023-07-28 13:57:25] [INFO] local node's last receive lsn: 0/5E623340
[2023-07-28 13:57:25] [INFO] checking state of sibling node ""WitnessServer"" (ID: 3)
[2023-07-28 13:57:25] [INFO] node ""WitnessServer"" (ID: 3) reports its upstream is node 1, last seen 1 second(s) ago
[2023-07-28 13:57:25] [NOTICE] witness node ""WitnessServer"" (ID: 3) last saw primary node 1 second(s) ago, considering primary still visible
[2023-07-28 13:57:25] [INFO] 1 nodes can see the primary
[2023-07-28 13:57:25] [DETAIL] following nodes can see the primary:

  • node ""WitnessServer"" (ID: 3): 1 second(s) ago

[2023-07-28 13:57:25] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds
[2023-07-28 13:57:25] [NOTICE] promotion candidate is ""StandbyServer"" (ID: 2)
[2023-07-28 13:57:25] [NOTICE] this node is the winner, will now promote itself and inform other nodes
[2023-07-28 13:57:25] [INFO] promote_command is:
""/usr/pgsql-14/bin/repmgr standby promote -f /etc/repmgr/14/repmgr.conf --log-to-file""
[2023-07-28 13:57:25] [NOTICE] redirecting logging output to ""/var/log/repmgr/repmgrd.log""

[2023-07-28 13:57:27] [WARNING] 1 sibling nodes found, but option ""--siblings-follow"" not specified
[2023-07-28 13:57:27] [DETAIL] these nodes will remain attached to the current primary:
WitnessServer (node ID: 3, witness server)
[2023-07-28 13:57:27] [NOTICE] promoting standby to primary
[2023-07-28 13:57:27] [DETAIL] promoting server ""StandbyServer"" (ID: 2) using pg_promote()
[2023-07-28 13:57:27] [NOTICE] waiting up to 60 seconds (parameter ""promote_check_timeout"") for promotion to complete
[2023-07-28 13:57:28] [NOTICE] STANDBY PROMOTE successful
[2023-07-28 13:57:28] [DETAIL] server ""StandbyServer"" (ID: 2) was successfully promoted to primary
[2023-07-28 13:57:28] [INFO] checking state of node 2, 1 of 6 attempts
[2023-07-28 13:57:28] [NOTICE] node 2 has recovered, reconnecting
[2023-07-28 13:57:28] [INFO] connection to node 2 succeeded
[2023-07-28 13:57:28] [INFO] original connection is still available
[2023-07-28 13:57:28] [INFO] 1 followers to notify
[2023-07-28 13:57:28] [NOTICE] notifying node ""WitnessServer"" (ID: 3) to follow node 2
INFO: node 3 received notification to follow node 2
[2023-07-28 13:57:28] [INFO] switching to primary monitoring mode
[2023-07-28 13:57:28] [NOTICE] monitoring cluster primary ""StandbyServer"" (ID: 2)"

standby can not see primary but it can see witness and witness can see primary. Standby should not promote because if witness can see primary, primary is possible up.