Failed to get list of machines from V3<Unknown error: '404 page not found',
CrimsonCyborg opened this issue · comments
What happened?
I have tired to switch to V2,still faced this issue.
[root@pg_node02 ~]# curl http://127.0.0.1:2379/v2/members
{"members":[{"id":"1b7d2bfdd2693665","name":"etcd-01","peerURLs":["http://192.168.96.120:2380"],"clientURLs":["http://127.0.0.1:2379","http://192.168.96.120:2379"]},{"id":"9e59fff250c2e067","name":"etcd-03","peerURLs":["http://192.168.96.122:2380"],"clientURLs":["http://127.0.0.1:2379","http://192.168.96.122:2379"]},{"id":"c0961108ce538c9d","name":"etcd-02","peerURLs":["http://192.168.96.121:2380"],"clientURLs":["http://127.0.0.1:2379","http://192.168.96.121:2379"]}]}
AND I switch to V3 again(with --enable-v2=false,here is the new issue.
[root@pg_node01 ~]# curl http://127.0.0.1:2379/version
{"etcdserver":"3.4.31","etcdcluster":"3.4.0"}[root@pg_node01 ~]# curl http://127.0.0.1:2379/v3/version
404 page not found
[root@pg_node01 ~]# systemctl restart etcd
[root@pg_node01 ~]# curl http://127.0.0.1:2379/v3/version
404 page not found
[root@pg_node01 ~]# curl http://127.0.0.1:2379/v3/members
404 page not found
It seem to be neither V2 nor V3...
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://127.0.0.1:2379 | 1b7d2bfdd2693665 | 3.4.31 | 16 kB | false | false | 2043 | 58 | 58 | |
| http://192.168.96.120:2379 | 1b7d2bfdd2693665 | 3.4.31 | 16 kB | false | false | 2043 | 58 | 58 | |
| http://127.0.0.1:2379 | 1b7d2bfdd2693665 | 3.4.31 | 16 kB | false | false | 2043 | 58 | 58 | |
| http://192.168.96.122:2379 | 9e59fff250c2e067 | 3.4.31 | 20 kB | true | false | 2043 | 58 | 58 | |
| http://127.0.0.1:2379 | 1b7d2bfdd2693665 | 3.4.31 | 16 kB | false | false | 2043 | 58 | 58 | |
| http://192.168.96.121:2379 | c0961108ce538c9d | 3.4.31 | 20 kB | false | false | 2043 | 58 | 58 | |
+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
Mar 29 07:33:31 pg_node01 etcd[1934]: established a TCP streaming connection with peer c0961108ce538c9d (stream Message writer)
Mar 29 07:33:31 pg_node01 etcd[1934]: established a TCP streaming connection with peer c0961108ce538c9d (stream MsgApp v2 writer)
Mar 29 07:33:31 pg_node01 etcd[1934]: 1b7d2bfdd2693665 initialized peer connection; fast-forwarding 8 ticks (election ticks 10) with 2 active peer(s)
Mar 29 07:33:31 pg_node01 etcd[1934]: raft2024/03/29 07:33:31 INFO: raft.node: 1b7d2bfdd2693665 elected leader 9e59fff250c2e067 at term 2043
Mar 29 07:33:31 pg_node01 etcd[1934]: ready to serve client requests
Mar 29 07:33:31 pg_node01 etcd[1934]: ready to serve client requests
Mar 29 07:33:31 pg_node01 etcd[1934]: published {Name:etcd-01 ClientURLs:[http://127.0.0.1:2379 http://192.168.96.120:2379]} to cluster d41890a8da40dfe8
Mar 29 07:33:31 pg_node01 systemd[1]: Started Etcd Server.
Mar 29 07:33:31 pg_node01 etcd[1934]: serving insecure client requests on 192.168.96.120:2379, this is strongly discouraged!
Mar 29 07:33:31 pg_node01 etcd[1934]: serving insecure client requests on 127.0.0.1:2379, this is strongly discouraged!
[root@pg_node01 ~]# /opt/etcd-v3.4.31/etcdctl endpoint status --cluster -w table
Mar 29 07:33:31 pg_node02 etcd[1898]: lost the TCP streaming connection with peer 1b7d2bfdd2693665 (stream Message reader)
Mar 29 07:33:31 pg_node02 etcd[1898]: failed to dial 1b7d2bfdd2693665 on stream MsgApp v2 (peer 1b7d2bfdd2693665 failed to find local node c0961108ce538c9d)
Mar 29 07:33:31 pg_node02 etcd[1898]: peer 1b7d2bfdd2693665 became inactive (message send to peer failed)
Mar 29 07:33:31 pg_node02 etcd[1898]: peer 1b7d2bfdd2693665 became active
Mar 29 07:33:31 pg_node02 etcd[1898]: closed an existing TCP streaming connection with peer 1b7d2bfdd2693665 (stream MsgApp v2 writer)
Mar 29 07:33:31 pg_node02 etcd[1898]: established a TCP streaming connection with peer 1b7d2bfdd2693665 (stream MsgApp v2 writer)
Mar 29 07:33:31 pg_node02 etcd[1898]: closed an existing TCP streaming connection with peer 1b7d2bfdd2693665 (stream Message writer)
Mar 29 07:33:31 pg_node02 etcd[1898]: established a TCP streaming connection with peer 1b7d2bfdd2693665 (stream Message writer)
Mar 29 07:33:31 pg_node02 etcd[1898]: established a TCP streaming connection with peer 1b7d2bfdd2693665 (stream Message reader)
Mar 29 07:33:31 pg_node02 etcd[1898]: established a TCP streaming connection with peer 1b7d2bfdd2693665 (stream MsgApp v2 reader)
name: etcd-1
data-dir: /opt/etcd-v3.4.31/data
listen-client-urls: http://192.168.96.120:2379,http://127.0.0.1:2379
advertise-client-urls: http://192.168.96.120:2379/ #,http://127.0.0.1:2379
listen-peer-urls: http://192.168.96.120:2380/
initial-advertise-peer-urls: http://192.168.96.120:2380/
initial-cluster: etcd-1=http://192.168.96.120:2380,etcd-2=http://192.168.96.121:2380,etcd-3http://192.168.96.122:2380
initial-cluster-token: etcd-cluster-token
initial-cluster-state: new
#[root@pgnode1 etcd]#
How can we reproduce it (as minimally and precisely as possible)?
really dont know maybe I can provide more information if you want.
What did you expect to happen?
pass
Patroni/PostgreSQL/DCS version
- Patroni version: 3.2.2
- PostgreSQL version: 15.6
- DCS (and its version): rhel8
Patroni configuration file
scope: pgsql16
namespace: /pgsql/
name: pgsql_slot1
restapi:
listen: 192.168.96.120:8008
connect_address: 192.168.96.120:8008
etcd3:
hosts: 192.168.96.120:2379,192.168.96.121:2379,192.168.96.122:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
master_start_timeout: 300
synchronous_mode: false
postgresql:
use_pg_rewind: true
use_slots: true
parameters:
listen_addresses: "0.0.0.0"
port: 5432
wal_level: logical
hot_standby: "on"
wal_keep_segments: 1000
max_wal_senders: 10
max_replication_slots: 10
wal_log_hints: "on"
postgresql:
listen: 0.0.0.0:5432
connect_address: 192.168.96.120:5432
data_dir: /software/pgsql/data
bin_dir: /software/pgsql/bin
authentication:
replication:
username: replica
password: post1234
superuser:
username: postgres
password: post1234
basebackup:
#max-rate: 100M
checkpoint: fast
callbacks:
on_start: /bin/bash /etc/patroni/patroni_callback.sh
on_stop: /bin/bash /etc/patroni/patroni_callback.sh
on_role_change: /bin/bash /etc/patroni/patroni_callback.sh
watchdog:
mode: automatic # Allowed values: off, automatic, required
device: /dev/watchdog
safety_margin: 5
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
[root@pgnode1 etcd]#
patronictl show-config
Error: Can not find suitable configuration of distributed configuration store
Available implementations: etcd, etcd3, kubernetes
Patroni log files
[root@pg_node02 ~]# systemctl status patroni
● patroni.service - Runners to orchestrate a high-availability PostgreSQL
Loaded: loaded (/usr/lib/systemd/system/patroni.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2024-03-29 06:01:15 EDT; 20min ago
Main PID: 1965 (patroni)
Tasks: 2 (limit: 26213)
Memory: 23.1M
CGroup: /system.slice/patroni.service
└─1965 /usr/bin/python3 /usr/local/bin/patroni /etc/patroni/patroni.yml
Mar 29 06:21:32 pg_node02 patroni[1965]: 2024-03-29 06:21:32,944 ERROR: Failed to get list of machines from http://192.168.96.120:2379/v3: <Unknown error: '404 page not found',>
Mar 29 06:21:32 pg_node02 patroni[1965]: 2024-03-29 06:21:32,944 INFO: waiting on etcd
Mar 29 06:21:37 pg_node02 patroni[1965]: 2024-03-29 06:21:37,993 ERROR: Failed to get list of machines from http://192.168.96.121:2379/v3: <Unknown error: '404 page not found',>
Mar 29 06:21:37 pg_node02 patroni[1965]: 2024-03-29 06:21:37,995 ERROR: Failed to get list of machines from http://192.168.96.122:2379/v3: <Unknown error: '404 page not found',>
Mar 29 06:21:37 pg_node02 patroni[1965]: 2024-03-29 06:21:37,998 ERROR: Failed to get list of machines from http://192.168.96.120:2379/v3: <Unknown error: '404 page not found',>
Mar 29 06:21:37 pg_node02 patroni[1965]: 2024-03-29 06:21:37,998 INFO: waiting on etcd
Mar 29 06:21:43 pg_node02 patroni[1965]: 2024-03-29 06:21:43,046 ERROR: Failed to get list of machines from http://192.168.96.121:2379/v3: <Unknown error: '404 page not found',>
Mar 29 06:21:43 pg_node02 patroni[1965]: 2024-03-29 06:21:43,048 ERROR: Failed to get list of machines from http://192.168.96.122:2379/v3: <Unknown error: '404 page not found',>
Mar 29 06:21:43 pg_node02 patroni[1965]: 2024-03-29 06:21:43,049 ERROR: Failed to get list of machines from http://192.168.96.120:2379/v3: <Unknown error: '404 page not found',>
Mar 29 06:21:43 pg_node02 patroni[1965]: 2024-03-29 06:21:43,049 INFO: waiting on etcd
PostgreSQL log files
2024-03-29 03:53:38.972 EDT [1767] LOG: starting PostgreSQL 15.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-3), 64-bit
2024-03-29 03:53:38.972 EDT [1767] LOG: listening on IPv4 address "0.0.0.0", port 5432
2024-03-29 03:53:38.972 EDT [1767] LOG: listening on IPv6 address "::", port 5432
2024-03-29 03:53:38.973 EDT [1767] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-03-29 03:53:38.977 EDT [1771] LOG: database system was shut down at 2024-03-29 03:50:06 EDT
2024-03-29 03:53:38.984 EDT [1767] LOG: database system is ready to accept connections
2024-03-29 03:56:12.958 EDT [1769] LOG: checkpoint starting: force wait
2024-03-29 03:56:12.961 EDT [1769] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.001 s, sync=0.001 s, total=0.004 s; sync files=2, longest=0.001 s, average=0.001 s; distance=16384 kB, estimate=16384 kB
2024-03-29 03:57:45.163 EDT [1769] LOG: checkpoint starting: force wait
2024-03-29 03:57:45.165 EDT [1769] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 2 recycled; write=0.001 s, sync=0.001 s, total=0.003 s; sync files=0, longest=0.000 s, average=0.000 s; distance=32768 kB, estimate=32768 kB
2024-03-29 04:02:45.265 EDT [1769] LOG: checkpoint starting: time
2024-03-29 04:02:45.268 EDT [1769] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 2 recycled; write=0.001 s, sync=0.001 s, total=0.003 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16384 kB, estimate=31129 kB
2024-03-29 04:24:04.184 EDT [1767] LOG: received smart shutdown request
2024-03-29 04:24:04.184 EDT [1767] LOG: received SIGHUP, reloading configuration files
2024-03-29 04:24:04.187 EDT [1767] LOG: background worker "logical replication launcher" (PID 1775) exited with exit code 1
2024-03-29 04:24:04.190 EDT [1769] LOG: shutting down
2024-03-29 04:24:04.221 EDT [1769] LOG: checkpoint starting: shutdown immediate
2024-03-29 04:24:04.224 EDT [1769] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.004 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16383 kB, estimate=29655 kB
2024-03-29 04:24:04.226 EDT [1767] LOG: database system is shut down
~
Have you tried to use GitHub issue search?
- Yes
Anything else we need to know?
I have tried switch to V2