zalando / patroni

A template for PostgreSQL High Availability with Etcd, Consul, ZooKeeper, or Kubernetes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Failed to get list of machines from V3<Unknown error: '404 page not found',

CrimsonCyborg opened this issue · comments

What happened?

I have tired to switch to V2,still faced this issue.

[root@pg_node02 ~]# curl http://127.0.0.1:2379/v2/members
{"members":[{"id":"1b7d2bfdd2693665","name":"etcd-01","peerURLs":["http://192.168.96.120:2380"],"clientURLs":["http://127.0.0.1:2379","http://192.168.96.120:2379"]},{"id":"9e59fff250c2e067","name":"etcd-03","peerURLs":["http://192.168.96.122:2380"],"clientURLs":["http://127.0.0.1:2379","http://192.168.96.122:2379"]},{"id":"c0961108ce538c9d","name":"etcd-02","peerURLs":["http://192.168.96.121:2380"],"clientURLs":["http://127.0.0.1:2379","http://192.168.96.121:2379"]}]}

AND I switch to V3 again(with --enable-v2=false,here is the new issue.
[root@pg_node01 ~]# curl http://127.0.0.1:2379/version
{"etcdserver":"3.4.31","etcdcluster":"3.4.0"}[root@pg_node01 ~]# curl http://127.0.0.1:2379/v3/version
404 page not found
[root@pg_node01 ~]# systemctl restart etcd
[root@pg_node01 ~]# curl http://127.0.0.1:2379/v3/version
404 page not found
[root@pg_node01 ~]# curl http://127.0.0.1:2379/v3/members
404 page not found

It seem to be neither V2 nor V3...

+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://127.0.0.1:2379 | 1b7d2bfdd2693665 | 3.4.31 | 16 kB | false | false | 2043 | 58 | 58 | |
| http://192.168.96.120:2379 | 1b7d2bfdd2693665 | 3.4.31 | 16 kB | false | false | 2043 | 58 | 58 | |
| http://127.0.0.1:2379 | 1b7d2bfdd2693665 | 3.4.31 | 16 kB | false | false | 2043 | 58 | 58 | |
| http://192.168.96.122:2379 | 9e59fff250c2e067 | 3.4.31 | 20 kB | true | false | 2043 | 58 | 58 | |
| http://127.0.0.1:2379 | 1b7d2bfdd2693665 | 3.4.31 | 16 kB | false | false | 2043 | 58 | 58 | |
| http://192.168.96.121:2379 | c0961108ce538c9d | 3.4.31 | 20 kB | false | false | 2043 | 58 | 58 | |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
Mar 29 07:33:31 pg_node01 etcd[1934]: established a TCP streaming connection with peer c0961108ce538c9d (stream Message writer)
Mar 29 07:33:31 pg_node01 etcd[1934]: established a TCP streaming connection with peer c0961108ce538c9d (stream MsgApp v2 writer)
Mar 29 07:33:31 pg_node01 etcd[1934]: 1b7d2bfdd2693665 initialized peer connection; fast-forwarding 8 ticks (election ticks 10) with 2 active peer(s)
Mar 29 07:33:31 pg_node01 etcd[1934]: raft2024/03/29 07:33:31 INFO: raft.node: 1b7d2bfdd2693665 elected leader 9e59fff250c2e067 at term 2043
Mar 29 07:33:31 pg_node01 etcd[1934]: ready to serve client requests
Mar 29 07:33:31 pg_node01 etcd[1934]: ready to serve client requests
Mar 29 07:33:31 pg_node01 etcd[1934]: published {Name:etcd-01 ClientURLs:[http://127.0.0.1:2379 http://192.168.96.120:2379]} to cluster d41890a8da40dfe8
Mar 29 07:33:31 pg_node01 systemd[1]: Started Etcd Server.
Mar 29 07:33:31 pg_node01 etcd[1934]: serving insecure client requests on 192.168.96.120:2379, this is strongly discouraged!
Mar 29 07:33:31 pg_node01 etcd[1934]: serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged!
[root@pg_node01 ~]# /opt/etcd-v3.4.31/etcdctl endpoint status --cluster -w table
Mar 29 07:33:31 pg_node02 etcd[1898]: lost the TCP streaming connection with peer 1b7d2bfdd2693665 (stream Message reader)
Mar 29 07:33:31 pg_node02 etcd[1898]: failed to dial 1b7d2bfdd2693665 on stream MsgApp v2 (peer 1b7d2bfdd2693665 failed to find local node c0961108ce538c9d)
Mar 29 07:33:31 pg_node02 etcd[1898]: peer 1b7d2bfdd2693665 became inactive (message send to peer failed)
Mar 29 07:33:31 pg_node02 etcd[1898]: peer 1b7d2bfdd2693665 became active
Mar 29 07:33:31 pg_node02 etcd[1898]: closed an existing TCP streaming connection with peer 1b7d2bfdd2693665 (stream MsgApp v2 writer)
Mar 29 07:33:31 pg_node02 etcd[1898]: established a TCP streaming connection with peer 1b7d2bfdd2693665 (stream MsgApp v2 writer)
Mar 29 07:33:31 pg_node02 etcd[1898]: closed an existing TCP streaming connection with peer 1b7d2bfdd2693665 (stream Message writer)
Mar 29 07:33:31 pg_node02 etcd[1898]: established a TCP streaming connection with peer 1b7d2bfdd2693665 (stream Message writer)
Mar 29 07:33:31 pg_node02 etcd[1898]: established a TCP streaming connection with peer 1b7d2bfdd2693665 (stream Message reader)
Mar 29 07:33:31 pg_node02 etcd[1898]: established a TCP streaming connection with peer 1b7d2bfdd2693665 (stream MsgApp v2 reader)


name: etcd-1
data-dir: /opt/etcd-v3.4.31/data
listen-client-urls: http://192.168.96.120:2379,http://127.0.0.1:2379
advertise-client-urls: http://192.168.96.120:2379/ #,http://127.0.0.1:2379
listen-peer-urls: http://192.168.96.120:2380/
initial-advertise-peer-urls: http://192.168.96.120:2380/
initial-cluster: etcd-1=http://192.168.96.120:2380,etcd-2=http://192.168.96.121:2380,etcd-3http://192.168.96.122:2380
initial-cluster-token: etcd-cluster-token
initial-cluster-state: new
#[root@pgnode1 etcd]#

How can we reproduce it (as minimally and precisely as possible)?

really dont know maybe I can provide more information if you want.

What did you expect to happen?

pass

Patroni/PostgreSQL/DCS version

  • Patroni version: 3.2.2
  • PostgreSQL version: 15.6
  • DCS (and its version): rhel8

Patroni configuration file

scope: pgsql16
namespace: /pgsql/
name: pgsql_slot1
restapi:
  listen: 192.168.96.120:8008
  connect_address: 192.168.96.120:8008
etcd3:
  hosts: 192.168.96.120:2379,192.168.96.121:2379,192.168.96.122:2379
bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    master_start_timeout: 300
    synchronous_mode: false
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        listen_addresses: "0.0.0.0"
        port: 5432
        wal_level: logical
        hot_standby: "on"
        wal_keep_segments: 1000
        max_wal_senders: 10
        max_replication_slots: 10
        wal_log_hints: "on"
postgresql:
  listen: 0.0.0.0:5432
  connect_address: 192.168.96.120:5432
  data_dir: /software/pgsql/data
  bin_dir: /software/pgsql/bin
  authentication:
    replication:
      username: replica
      password: post1234
    superuser:
      username: postgres
      password: post1234
  basebackup:
    #max-rate: 100M
    checkpoint: fast
  
  callbacks: 
    on_start: /bin/bash /etc/patroni/patroni_callback.sh 
    on_stop: /bin/bash /etc/patroni/patroni_callback.sh 
    on_role_change: /bin/bash /etc/patroni/patroni_callback.sh 
  
watchdog:
  mode: automatic # Allowed values: off, automatic, required
  device: /dev/watchdog 
  safety_margin: 5
tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false
[root@pgnode1 etcd]#

patronictl show-config

Error: Can not find suitable configuration of distributed configuration store
Available implementations: etcd, etcd3, kubernetes

Patroni log files

[root@pg_node02 ~]# systemctl status patroni
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
   Loaded: loaded (/usr/lib/systemd/system/patroni.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2024-03-29 06:01:15 EDT; 20min ago
 Main PID: 1965 (patroni)
    Tasks: 2 (limit: 26213)
   Memory: 23.1M
   CGroup: /system.slice/patroni.service
           └─1965 /usr/bin/python3 /usr/local/bin/patroni /etc/patroni/patroni.yml

Mar 29 06:21:32 pg_node02 patroni[1965]: 2024-03-29 06:21:32,944 ERROR: Failed to get list of machines from http://192.168.96.120:2379/v3: <Unknown error: '404 page not found',>
Mar 29 06:21:32 pg_node02 patroni[1965]: 2024-03-29 06:21:32,944 INFO: waiting on etcd
Mar 29 06:21:37 pg_node02 patroni[1965]: 2024-03-29 06:21:37,993 ERROR: Failed to get list of machines from http://192.168.96.121:2379/v3: <Unknown error: '404 page not found',>
Mar 29 06:21:37 pg_node02 patroni[1965]: 2024-03-29 06:21:37,995 ERROR: Failed to get list of machines from http://192.168.96.122:2379/v3: <Unknown error: '404 page not found',>
Mar 29 06:21:37 pg_node02 patroni[1965]: 2024-03-29 06:21:37,998 ERROR: Failed to get list of machines from http://192.168.96.120:2379/v3: <Unknown error: '404 page not found',>
Mar 29 06:21:37 pg_node02 patroni[1965]: 2024-03-29 06:21:37,998 INFO: waiting on etcd
Mar 29 06:21:43 pg_node02 patroni[1965]: 2024-03-29 06:21:43,046 ERROR: Failed to get list of machines from http://192.168.96.121:2379/v3: <Unknown error: '404 page not found',>
Mar 29 06:21:43 pg_node02 patroni[1965]: 2024-03-29 06:21:43,048 ERROR: Failed to get list of machines from http://192.168.96.122:2379/v3: <Unknown error: '404 page not found',>
Mar 29 06:21:43 pg_node02 patroni[1965]: 2024-03-29 06:21:43,049 ERROR: Failed to get list of machines from http://192.168.96.120:2379/v3: <Unknown error: '404 page not found',>
Mar 29 06:21:43 pg_node02 patroni[1965]: 2024-03-29 06:21:43,049 INFO: waiting on etcd

PostgreSQL log files

2024-03-29 03:53:38.972 EDT [1767] LOG:  starting PostgreSQL 15.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-3), 64-bit
2024-03-29 03:53:38.972 EDT [1767] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-03-29 03:53:38.972 EDT [1767] LOG:  listening on IPv6 address "::", port 5432
2024-03-29 03:53:38.973 EDT [1767] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-03-29 03:53:38.977 EDT [1771] LOG:  database system was shut down at 2024-03-29 03:50:06 EDT
2024-03-29 03:53:38.984 EDT [1767] LOG:  database system is ready to accept connections
2024-03-29 03:56:12.958 EDT [1769] LOG:  checkpoint starting: force wait
2024-03-29 03:56:12.961 EDT [1769] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.001 s, sync=0.001 s, total=0.004 s; sync files=2, longest=0.001 s, average=0.001 s; distance=16384 kB, estimate=16384 kB
2024-03-29 03:57:45.163 EDT [1769] LOG:  checkpoint starting: force wait
2024-03-29 03:57:45.165 EDT [1769] LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 2 recycled; write=0.001 s, sync=0.001 s, total=0.003 s; sync files=0, longest=0.000 s, average=0.000 s; distance=32768 kB, estimate=32768 kB
2024-03-29 04:02:45.265 EDT [1769] LOG:  checkpoint starting: time
2024-03-29 04:02:45.268 EDT [1769] LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 2 recycled; write=0.001 s, sync=0.001 s, total=0.003 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16384 kB, estimate=31129 kB
2024-03-29 04:24:04.184 EDT [1767] LOG:  received smart shutdown request
2024-03-29 04:24:04.184 EDT [1767] LOG:  received SIGHUP, reloading configuration files
2024-03-29 04:24:04.187 EDT [1767] LOG:  background worker "logical replication launcher" (PID 1775) exited with exit code 1
2024-03-29 04:24:04.190 EDT [1769] LOG:  shutting down
2024-03-29 04:24:04.221 EDT [1769] LOG:  checkpoint starting: shutdown immediate
2024-03-29 04:24:04.224 EDT [1769] LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.004 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16383 kB, estimate=29655 kB
2024-03-29 04:24:04.226 EDT [1767] LOG:  database system is shut down
~

Have you tried to use GitHub issue search?

  • Yes

Anything else we need to know?

I have tried switch to V2