Repmgr Fatal error - connection pointer is NULL
AnjanaRatnayake opened this issue · comments
anjana commented
My setup for repmgr was a 3 node cluster using the following docker-compose:
networks:
my-network:
driver: bridge
ipam:
config:
- subnet: 10.89.0.0/24
services:
pg-0:
hostname: pg-0
container_name: pg-0
image: 'postgresql-repmgr:latest'
environment:
- POSTGRESQL_PASSWORD=custompassword
- REPMGR_PASSWORD=repmgrpassword
- REPMGR_PRIMARY_HOST=pg-0
- REPMGR_NODE_NETWORK_NAME=pg-0
- REPMGR_NODE_NAME=pg-0
- REPMGR_PARTNER_NODES=pg-0,pg-1,pg-2
- POSTGRESQL_PASSWORD=p
- POSTGRESQL_POSTGRES_PASSWORD=p
- REPMGR_FAILOVER=automatic
- REPMGR_LOG_LEVEL=DEBUG
networks:
my-network:
ipv4_address: 10.89.0.18
pg-1:
hostname: pg-1
container_name: pg-1
image: 'postgresql-repmgr:latest'
environment:
- POSTGRESQL_PASSWORD=custompassword
- REPMGR_PASSWORD=repmgrpassword
- REPMGR_PRIMARY_HOST=pg-0
- REPMGR_NODE_NETWORK_NAME=pg-1
- REPMGR_NODE_NAME=pg-1
- REPMGR_PARTNER_NODES=pg-0,pg-1,pg-2
- POSTGRESQL_PASSWORD=p
- POSTGRESQL_POSTGRES_PASSWORD=p
- REPMGR_FAILOVER=automatic
- REPMGR_LOG_LEVEL=DEBUG
networks:
my-network:
ipv4_address: 10.89.0.19
pg-2:
hostname: pg-2
container_name: pg-2
image: 'postgresql-repmgr:latest'
environment:
- POSTGRESQL_PASSWORD=custompassword
- REPMGR_PASSWORD=repmgrpassword
- REPMGR_PRIMARY_HOST=pg-0
- REPMGR_NODE_NETWORK_NAME=pg-2
- REPMGR_NODE_NAME=pg-2
- REPMGR_PARTNER_NODES=pg-0,pg-1,pg-2
- POSTGRESQL_PASSWORD=p
- POSTGRESQL_POSTGRES_PASSWORD=p
- REPMGR_FAILOVER=automatic
- REPMGR_LOG_LEVEL=DEBUG
networks:
my-network:
ipv4_address: 10.89.0.20
I ran a workload against this cluster that would send transactions consisting of updates to each node in the cluster.
I was also using Antithesis software to introduce some packet loss and network latency to test repmgr's resilience to network faults.
pg2 crashed with the following error:
INFO: node 1001 received notification to follow node 1000
[2023-02-07 18:07:21] [ERROR] get_primary_node_id(): unable to execute query
[2023-02-07 18:07:21] [DETAIL] query text is:
SELECT node_id FROM repmgr.nodes WHERE type = 'primary' AND active IS TRUE
[2023-02-07 18:07:21] [ERROR] reset_node_voting_status(): local_conn not set
[2023-02-07 18:07:21] [DETAIL]
connection pointer is NULL
[2023-02-07 18:07:21] [ERROR] _get_node_record(): unable to execute query
[2023-02-07 18:07:21] [DETAIL] query text is:
SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached FROM repmgr.nodes n WHERE n.node_id = 1000
[2023-02-07 18:07:21] [ERROR] unable to retrieve record for upstream node (ID: 1000), terminating