Repmgr Fatal error - connection pointer is NULL

Question

Repmgr Fatal error - connection pointer is NULL

AnjanaRatnayake opened this issue a year ago · comments

My setup for repmgr was a 3 node cluster using the following docker-compose:

networks:
  my-network:
    driver: bridge
    ipam:
      config:
        - subnet: 10.89.0.0/24

services:
  pg-0:
    hostname: pg-0
    container_name: pg-0
    image: 'postgresql-repmgr:latest'
    environment:
      - POSTGRESQL_PASSWORD=custompassword
      - REPMGR_PASSWORD=repmgrpassword
      - REPMGR_PRIMARY_HOST=pg-0
      - REPMGR_NODE_NETWORK_NAME=pg-0
      - REPMGR_NODE_NAME=pg-0
      - REPMGR_PARTNER_NODES=pg-0,pg-1,pg-2
      - POSTGRESQL_PASSWORD=p
      - POSTGRESQL_POSTGRES_PASSWORD=p
      - REPMGR_FAILOVER=automatic
      - REPMGR_LOG_LEVEL=DEBUG
    networks:
      my-network:
        ipv4_address: 10.89.0.18
  pg-1:
    hostname: pg-1
    container_name: pg-1
    image: 'postgresql-repmgr:latest'
    environment:
      - POSTGRESQL_PASSWORD=custompassword
      - REPMGR_PASSWORD=repmgrpassword
      - REPMGR_PRIMARY_HOST=pg-0
      - REPMGR_NODE_NETWORK_NAME=pg-1
      - REPMGR_NODE_NAME=pg-1
      - REPMGR_PARTNER_NODES=pg-0,pg-1,pg-2
      - POSTGRESQL_PASSWORD=p
      - POSTGRESQL_POSTGRES_PASSWORD=p
      - REPMGR_FAILOVER=automatic
      - REPMGR_LOG_LEVEL=DEBUG
    networks:
      my-network:
        ipv4_address: 10.89.0.19

  pg-2:
    hostname: pg-2
    container_name: pg-2
    image: 'postgresql-repmgr:latest'
    environment:
      - POSTGRESQL_PASSWORD=custompassword
      - REPMGR_PASSWORD=repmgrpassword
      - REPMGR_PRIMARY_HOST=pg-0
      - REPMGR_NODE_NETWORK_NAME=pg-2
      - REPMGR_NODE_NAME=pg-2
      - REPMGR_PARTNER_NODES=pg-0,pg-1,pg-2
      - POSTGRESQL_PASSWORD=p
      - POSTGRESQL_POSTGRES_PASSWORD=p
      - REPMGR_FAILOVER=automatic
      - REPMGR_LOG_LEVEL=DEBUG
    networks:
      my-network:
        ipv4_address: 10.89.0.20

I ran a workload against this cluster that would send transactions consisting of updates to each node in the cluster.

I was also using Antithesis software to introduce some packet loss and network latency to test repmgr's resilience to network faults.

pg2 crashed with the following error:

INFO:  node 1001 received notification to follow node 1000
[2023-02-07 18:07:21] [ERROR] get_primary_node_id(): unable to execute query
[2023-02-07 18:07:21] [DETAIL] query text is:
SELECT node_id            FROM repmgr.nodes     WHERE type = 'primary'    AND active IS TRUE
[2023-02-07 18:07:21] [ERROR] reset_node_voting_status(): local_conn not set
[2023-02-07 18:07:21] [DETAIL]
connection pointer is NULL
[2023-02-07 18:07:21] [ERROR] _get_node_record(): unable to execute query
[2023-02-07 18:07:21] [DETAIL] query text is:
SELECT n.node_id, n.type, n.upstream_node_id, n.node_name,  n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached   FROM repmgr.nodes n  WHERE n.node_id = 1000
[2023-02-07 18:07:21] [ERROR] unable to retrieve record for upstream node (ID: 1000), terminating