IntersectMBO / cardano-node

The core component that is used to participate in a Cardano decentralised blockchain.

Home Page:https://cardano.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] - Connection Handshake Error and Promotion Failure of Cold Big Ledger Peers in Cardano Node v8.9.2 Docker Image

astroz777 opened this issue · comments

Internal Issue

Area: Node Connection Management

Summary:
The Cardano node experiences connectivity issues with establishing and maintaining peer connections. These issues result in handshake errors and connection failures. Additionally, promotion attempts of cold big ledger peers fail due to connection errors, indicating potential issues with the availability of remote nodes. These problems significantly affect the node's performance, leading to a decrease in download speed by 90% and the validation of only one slot per second, particularly after reaching 120 million slots.

Steps to reproduce:

1.Download the Docker image ghcr.io/intersectmbo/cardano-node:8.9.2.
2.Use Docker to create a container.
3.Monitor the logs for errors related to connection management and promotion of cold big ledger peers.

System info:

OS Name: Ubuntu
OS Version: 22.04.1
Node version: Docker image ghcr.io/intersectmbo/cardano-node:8.9.2.
CLI version: Docker image ghcr.io/intersectmbo/cardano-node:8.9.2.
Docker version 26.1.0, build 9714adc

Additional context:
The logs indicate various errors, including handshake errors during connection establishment and connection refusal while attempting to promote cold big ledger peers. These errors suggest potential issues with the availability or accessibility of remote nodes, impacting the node's overall performance.

Initial Logs:

Logs:
2024-04-23 18:46:27 [8c5f5845:cardano.node.ChainDB:Info:5] [2024-04-23 21:46:27.09 UTC] Closed db with immutable tip at 0c08287b680a798aa7176d83fbacfa8bb196e62b18bd904037d497bf237fd3a0 at slot 122296839 and tip 6c2a171daf4d0350e1995fd718e79dbeb170f1a5403764c1605dfa64a1e662b6 at slot 122342476
2024-04-23 18:46:27
2024-04-23 18:46:27 Shutting down..
2024-04-23 18:46:27 cardano-node: thread killed
2024-04-23 18:59:36 Starting: exec /nix/store/9ph9pp8d21y17k95mhgk8jk2z2036dhj-cardano-node-exe-cardano-node-8.9.2/bin/cardano-node run
2024-04-23 18:59:36 --config /nix/store/76zh2cy72zz8bxh7k5c4v7rgiba9qrk1-config-0-0.json
2024-04-23 18:59:36 --database-path /data/db
2024-04-23 18:59:36 --topology /nix/store/czj80vj5a9vg1xdg3sgqpfj0hc8zv3pb-topology.yaml
2024-04-23 18:59:36 --host-addr 0.0.0.0
2024-04-23 18:59:36 --port 3001
2024-04-23 18:59:36 --socket-path /ipc/node.socket
Connection Handshake Error:
Logs:

[8c5f5845:cardano.node.ConnectionManager:Info:209] [2024-04-23 21:27:11.24 UTC] TrConnectionHandler (ConnectionId {localAddress = 172.17.0.2:3001, remoteAddress = 13.41.234.173:3001}) (TrConnectionHandlerError OutboundError ExceededTimeLimit ...)
Summary: A connection handshake error occurred while attempting to establish a connection with a remote node. The error message indicates that the handshake process exceeded the time limit.

Promotion Failure of Cold Big Ledger Peer:
Logs:

[8c5f5845:cardano.node.PeerSelection:Info:172] [2024-04-23 21:27:24.28 UTC] TracePromoteColdBigLedgerPeerFailed 10 9 3.216.77.109:3001 9.624079842171s Network.Socket.connect: <socket: 45>: does not exist (Connection refused)
Summary: The attempt to promote a cold big ledger peer failed due to a connection error. The connection was refused, indicating a possible issue with the availability of the remote node.

Is there any way to proceed? How can we address the issue to resume the chain synchronization process?

I'd like to provide additional information regarding an issue I encountered. It appears that whenever I attempt to reconnect with peers, my network experiences some form of restriction or blockage. Consequently, the connection becomes significantly slow. Yesterday, I closed the node and reopened it today. Initially, it functioned smoothly for approximately three minutes, during which time it successfully downloaded and validated 92,538 slots at a rate of 514 per second. However, shortly thereafter, the connection error with peers resurfaced, causing a drastic slowdown in both download and validation speeds to just one slot per second.

Furthermore, I should mention that I am utilizing a Ryzen 5 5600X processor and 32 GB of RAM. Below, I will include the logs from the last slot validated during the "fast mode" until the subsequent slot, which experienced a slowdown.

2024-04-24 22:36:50 [8c5f5845:cardano.node.GSM:Info:146] [2024-04-25 01:36:50.22 UTC] fromList [("currentSelection",Object (fromList [("kind",String "Tip"),("tipBlockNo",Number 1.0230401e7),("tipHash",String "722d22bb1f25e440d57b5e453d977a01f3be4960d90091420cea4aa0ce9dc254"),("tipSlotNo",Number 1.22442706e8)])),("kind",String "GsmEventEnterCaughtUp"),("peerNumber",Number 2.0)]
2024-04-24 22:36:50 [8c5f5845:cardano.node.PeerSelection:Notice:160] [2024-04-25 01:36:50.22 UTC] TraceLedgerStateJudgementChanged YoungEnough
2024-04-24 22:36:50 [8c5f5845:cardano.node.PeerSelectionCounters:Info:160] [2024-04-25 01:36:50.22 UTC] PeerSelectionCounters {coldPeers = 0, warmPeers = 1, hotPeers = 2, coldBigLedgerPeers = 0, warmBigLedgerPeers = 0, hotBigLedgerPeers = 0, localRoots = []}
2024-04-24 22:36:50 [8c5f5845:cardano.node.PeerSelection:Info:160] [2024-04-25 01:36:50.22 UTC] TraceBigLedgerPeersRequest 15 0
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.22 UTC] [String "FetchingNewLedgerState",Number 529.0,Number 2842.0]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String "RelayAccessAddress \"13.208.79.46\" 6000",String "PickedBigLedgerPeer",Number 8.216508481077716e-4]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String "RelayAccessDomain \"relays.ektrp.com\" 5859",String "PickedBigLedgerPeer",Number 1.5988289600208803e-3]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String "RelayAccessAddress \"52.167.20.127\" 4000",String "PickedBigLedgerPeer",Number 9.952803016943504e-4]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String "RelayAccessAddress \"3.228.183.84\" 3001",String "PickedBigLedgerPeer",Number 1.477863825118689e-3]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String "RelayAccessDomain \"cardano-relay-1.upbit.com\" 30800",String "PickedBigLedgerPeer",Number 1.213966274671122e-3]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String "RelayAccessAddress \"52.177.36.96\" 3000",String "PickedBigLedgerPeer",Number 2.0126865838078914e-3]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String "RelayAccessDomain \"ca.relays.cardanians.io\" 1000",String "PickedBigLedgerPeer",Number 3.0810352060967883e-3]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String "RelayAccessDomain \"r1.isp-r1.wjg.jp\" 3001",String "PickedBigLedgerPeer",Number 9.45254952409071e-4]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String "RelayAccessAddress \"149.102.140.196\" 6000",String "PickedBigLedgerPeer",Number 3.1429371641176667e-3]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String "RelayAccessAddress \"34.249.11.89\" 3001",String "PickedBigLedgerPeer",Number 6.547682720921369e-4]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String "RelayAccessDomain \"cardano-relays-2.nu.fi\" 3001",String "PickedBigLedgerPeer",Number 9.85369740943107e-4]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String "RelayAccessAddress \"3.222.153.137\" 3001",String "PickedBigLedgerPeer",Number 1.4786498276707937e-3]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String "RelayAccessAddress \"54.36.178.85\" 6000",String "PickedBigLedgerPeer",Number 2.773403186912947e-3]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String "RelayAccessDomain \"relay1.ada-stake.com\" 3001",String "PickedBigLedgerPeer",Number 1.06290992637487e-3]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String "RelayAccessDomain \"relay1.kaizn.kaizencrypto.com\" 6000",String "PickedBigLedgerPeer",Number 1.2318284900528509e-3]
2024-04-24 22:36:50 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:50.25 UTC] [String "[RelayAccessDomain \"relay1.kaizn.kaizencrypto.com\" 6000,RelayAccessDomain \"relay1.ada-stake.com\" 3001,RelayAccessAddress \"54.36.178.85\" 6000,RelayAccessAddress \"3.222.153.137\" 3001,RelayAccessDomain \"cardano-relays-2.nu.fi\" 3001,RelayAccessAddress \"34.249.11.89\" 3001,RelayAccessAddress \"149.102.140.196\" 6000,RelayAccessDomain \"r1.isp-r1.wjg.jp\" 3001,RelayAccessDomain \"ca.relays.cardanians.io\" 1000,RelayAccessAddress \"52.177.36.96\" 3000,RelayAccessDomain \"cardano-relay-1.upbit.com\" 30800,RelayAccessAddress \"3.228.183.84\" 3001,RelayAccessAddress \"52.167.20.127\" 4000,RelayAccessDomain \"relays.ektrp.com\" 5859,RelayAccessAddress \"13.208.79.46\" 6000]",Number 15.0,Number 15.0,String "PickedBigLedgerPeers"]
2024-04-24 22:36:51 [8c5f5845:cardano.node.LedgerPeers:Info:158] [2024-04-25 01:36:51.50 UTC] [String "UseLedgerPeers",Number 1.16812831e8]
2024-04-24 22:36:51 [8c5f5845:cardano.node.PeerSelection:Info:160] [2024-04-25 01:36:51.50 UTC] TraceBigLedgerPeersResults (fromList [51.89.97.27:1000,52.177.36.96:3000,34.147.178.39:3001,207.154.243.52:3001,3.228.183.84:3001,34.249.11.89:3001,3.222.153.137:3001,62.112.8.253:3001,52.167.20.127:4000,194.99.21.198:5859,13.208.79.46:6000,54.36.178.85:6000,149.102.140.196:6000,54.38.153.200:6000,15.164.112.189:30800]) 0 5s
2024-04-24 22:36:51 [8c5f5845:cardano.node.PeerSelectionCounters:Info:160] [2024-04-25 01:36:51.50 UTC] PeerSelectionCounters {coldPeers = 0, warmPeers = 1, hotPeers = 2, coldBigLedgerPeers = 15, warmBigLedgerPeers = 0, hotBigLedgerPeers = 0, localRoots = []}
2024-04-24 22:36:51 [8c5f5845:cardano.node.PeerSelection:Info:160] [2024-04-25 01:36:51.50 UTC] TracePromoteColdBigLedgerPeers 10 0 (fromList [52.177.36.96:3000,34.147.178.39:3001,207.154.243.52:3001,3.228.183.84:3001,34.249.11.89:3001,62.112.8.253:3001,194.99.21.198:5859,13.208.79.46:6000,54.36.178.85:6000,149.102.140.196:6000])
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:482] [2024-04-25 01:36:51.50 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 3})
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:483] [2024-04-25 01:36:51.50 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 3})
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:485] [2024-04-25 01:36:51.50 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 3})
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:484] [2024-04-25 01:36:51.50 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 3})
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:486] [2024-04-25 01:36:51.50 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 3})
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:488] [2024-04-25 01:36:51.50 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 3})
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:487] [2024-04-25 01:36:51.50 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 3})
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:489] [2024-04-25 01:36:51.50 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 3})
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:491] [2024-04-25 01:36:51.50 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 3})
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:490] [2024-04-25 01:36:51.50 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 3})
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:490] [2024-04-25 01:36:51.72 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 4})
2024-04-24 22:36:51 [8c5f5845:cardano.node.Handshake:Info:492] [2024-04-25 01:36:51.72 UTC] WithMuxBearer (ConnectionId {localAddress = 172.17.0.2:3001, remoteAddress = 54.36.178.85:6000}) Send (ClientAgency TokPropose,MsgProposeVersions (fromList [(NodeToNodeV_7,TList [TInt 764824073,TBool False]),(NodeToNodeV_8,TList [TInt 764824073,TBool False]),(NodeToNodeV_9,TList [TInt 764824073,TBool False]),(NodeToNodeV_10,TList [TInt 764824073,TBool False]),(NodeToNodeV_11,TList [TInt 764824073,TBool False,TInt 0,TBool False]),(NodeToNodeV_12,TList [TInt 764824073,TBool False,TInt 0,TBool False]),(NodeToNodeV_13,TList [TInt 764824073,TBool False,TInt 0,TBool False])]))
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:483] [2024-04-25 01:36:51.73 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 5})
2024-04-24 22:36:51 [8c5f5845:cardano.node.Handshake:Info:493] [2024-04-25 01:36:51.73 UTC] WithMuxBearer (ConnectionId {localAddress = 172.17.0.2:3001, remoteAddress = 34.147.178.39:3001}) Send (ClientAgency TokPropose,MsgProposeVersions (fromList [(NodeToNodeV_7,TList [TInt 764824073,TBool False]),(NodeToNodeV_8,TList [TInt 764824073,TBool False]),(NodeToNodeV_9,TList [TInt 764824073,TBool False]),(NodeToNodeV_10,TList [TInt 764824073,TBool False]),(NodeToNodeV_11,TList [TInt 764824073,TBool False,TInt 0,TBool False]),(NodeToNodeV_12,TList [TInt 764824073,TBool False,TInt 0,TBool False]),(NodeToNodeV_13,TList [TInt 764824073,TBool False,TInt 0,TBool False])]))
2024-04-24 22:36:51 [8c5f5845:cardano.node.ConnectionManager:Info:487] [2024-04-25 01:36:51.73 UTC] TrConnectionManagerCounters (ConnectionManagerCounters {fullDuplexConns = 0, duplexConns = 3, unidirectionalConns = 0, inboundConns = 0, outboundConns = 6})```

Thanks, @astroz777, for your report. Could you check if the useLedgerAfterSlot in the topology.json file doesn't point to a too old slot? If the node tries to use the ledger peers too early, it will get outdated information for the ledger about relays that are no longer maintained.

Hello! @coot Thanks for the help. I'm using the default topology from the Docker image. Should I change it to another one, or just increase the useLedgerAfterSlot number to a higher value?"

{"bootstrapPeers":[{"address":"backbone.cardano.iog.io","port":3001},{"address":"backbone.mainnet.emurgornd.com","port":3001}],"localRoots":[],"publicRoots":[],"useLedgerAfterSlot":116812831}

You're useLedgerAfterSlot seems fine. Could you share the logs when the incident happened? You can send them privately to my email marcin.szamotulski@iohk.io or on matrix @the-coot:matrix.org.