paritytech / polkadot-sdk

The Parity Polkadot Blockchain SDK

Home Page:https://polkadot.network/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cant sync a kusama-people node from scratch with 1.12, working with 1.11, regression?

rvalle opened this issue · comments

Hi!

I cant sync a kusama people parachain node. I am experiencing troubles I have never seem when running all other nodes.

I am using docker distribution: polkadot-parachain:1.12.0, I use the RelayChain RPC interface.

I vanilla start with docker run --rm parity/polkadot-parachain:1.12.0 --chain people-kusama complaints about most (or all, not sure) nodes having a genesis mistmatch:

2024-05-28 09:37:31 [Parachain] Report 12D3KooWS7uzh62LChjfbyYGj1U5yGYaKNWMzzh6AAWHiJ5aLYLH: -2147483648 to -2147483648. Reason: Genesis mismatch. Banned, disconnecting.

I then restrict to boot nodes only using flags reserved-only and the boot nodes as reserved nodes, and then I get parachain blocks.... however no finalizations,

eventually I reach the top of the chain, also using the RelayChain RPC interface:

2024-05-28 09:40:46 [Parachain] ⚙️  Preparing  0.0 bps, target=#94115 (1 peers), best: #94015 (0x2dee…a9c6), finalized #0 (0xc1af…8b3f), ⬇ 12 B/s ⬆ 26 B/s

but no blocks appear to be finalized,

and eventually I get this other warning constantly:

2024-05-28 09:40:42 [Parachain] Event distribution channel has reached its limit. This can lead to missed notifications. error=TrySendError { kind: Full }

I am not sure what is going on. I am using the new default --prune archive-canonical and tried also the different sync modes (fast, warp) but nothing seems to make a difference.

I have also tried to use the polkadot-collator image but does not seem to have the kusama people spec.

What am I missing?

here is an example command that wont work:

docker run \
   parity/polkadot-parachain:1.12.0 \
   --chain people-kusama \
   --relay-chain-rpc-url wss://rpc.ibp.network/kusama

eventually fails, killing the node, with:

2024-05-28 12:05:23 [Parachain] ⚙️  Syncing 391.4 bps, target=#94755 (7 peers), best: #85939 (0x294d…66f1), finalized #0 (0xc1af…8b3f), ⬇ 2.2MiB/s ⬆ 1.3kiB/s    
2024-05-28 12:05:24 [Relaychain] Received imported block via RPC: #23365318 (0x28bd…88f7 -> 0xe6c6…8790)
2024-05-28 12:05:24 [Relaychain] Received imported block via RPC: #23365318 (0x28bd…88f7 -> 0x1d88…d4f9)
2024-05-28 12:05:25 [Relaychain] Overseer exited with error err=Generated(SubsystemStalled("collator-protocol-subsystem", "signal", "polkadot_node_subsystem_types::OverseerSignal"))
2024-05-28 12:05:25 [Relaychain] subsystem exited with error subsystem="network-bridge-rx" err=FromOrigin { origin: "network-bridge", source: SubsystemError(Generated(Context("Signal channel is terminated and empty."))) }
2024-05-28 12:05:25 [Relaychain] subsystem exited with error subsystem="chain-api" err=FromOrigin { origin: "chain-api", source: Generated(Context("Signal channel is terminated and empty.")) }
2024-05-28 12:05:25 [Relaychain] subsystem exited with error subsystem="network-bridge-tx" err=FromOrigin { origin: "network-bridge", source: SubsystemError(Generated(Context("Signal channel is terminated and empty."))) }
2024-05-28 12:05:25 [Relaychain] error receiving message from subsystem context: Generated(Context("Signal channel is terminated and empty.")) err=Generated(Context("Signal channel is terminated and empty."))
2024-05-28 12:05:25 [Relaychain] subsystem exited with error subsystem="availability-recovery" err=FromOrigin { origin: "availability-recovery", source: Generated(Context("Signal channel is terminated and empty.")) }
2024-05-28 12:05:25 [Relaychain] Protocol command streams have been shut down    
2024-05-28 12:05:25 [Relaychain] Essential task `overseer` failed. Shutting down service.    
2024-05-28 12:05:25 [Relaychain] subsystem exited with error subsystem="runtime-api" err=Generated(Context("Signal channel is terminated and empty."))
Error: Service(Other("Essential task failed."))

here is another example running with the released binary in this repository:

./polkadot-parachain  --chain people-kusama --relay-chain-rpc-url wss://rpc.ibp.network/kusama

which reports version: version 1.12.0-b4016902ac7

similar behaviour....

However, if I use the 1.11.0 release, and the parachain spec from the repo, here, as Paranodes remembers their initial sync, then it seems to work:

2024-05-28 14:19:48 [Relaychain] Received imported block via RPC: #23365460 (0x8b8d…8d67 -> 0xd6ae…6073)
2024-05-28 14:19:48 [Parachain] ♻️  Reorg on #94819,0xd53d…1cc7 to #94819,0x9610…7b2d, common ancestor #94818,0x6fbd…faee    
2024-05-28 14:19:50 [Parachain] 💤 Idle (7 peers), best: #94819 (0x9610…7b2d), finalized #94817 (0x9efd…10ab), ⬇ 6.8kiB/s ⬆ 4.1kiB/s    
2024-05-28 14:19:51 [Relaychain] Received finalized block via RPC: #23365457 (0x09f5…8239 -> 0x951c…cb3c)

is perhaps the initial sync broken in the latest release?

https://gist.githubusercontent.com/hitchhooker/61a00eb3e3bda432598351347048af8b/raw/23d0c5d0c1e5ceed5bd2e0dd21e14cde3d38dc3d/gistfile1.txt

root@kppl27:/opt/cumulus# cat cumulus.service
[Unit]
Description="kppl27 endpoint - Cumulus service"
After=network-online.target
Wants=network-online.target

[Service]
User=cumulus
Group=cumulus
ExecStart=/opt/cumulus/cumulus \
  --name "Rotko Networks - kppl27 Endpoint" \
  --chain /opt/cumulus/people-kusama.json \
  --base-path /opt/cumulus \
  --state-pruning archive \
  --blocks-pruning=archive \
  --database paritydb \
  --sync full \
  --listen-addr /ip4/0.0.0.0/tcp/33857 \
  --listen-addr /ip4/0.0.0.0/tcp/34857/ws \
  --public-addr /ip4/27.131.160.106/tcp/33857 \
  --public-addr /ip4/27.131.160.106/tcp/34857/ws \
  --public-addr /dns/kppl27.rotko.net/tcp/33857 \
  --public-addr /dns/kppl27.rotko.net/tcp/34857/ws \
  --public-addr /dns/kppl27.rotko.net/tcp/35857/wss \
  --rpc-port 9857 \
  --prometheus-port 7857 \
  --prometheus-external \
  --relay-chain-rpc-urls ws://192.168.69.24:9324 \
  --wasm-execution Compiled \
  --no-hardware-benchmarks \
  --max-runtime-instances 32 \
  --rpc-max-request-size 16 \
  --rpc-max-response-size 16 \
  --rpc-max-subscriptions-per-connection 512 \
  --rpc-max-connections 10000 \
  --rpc-external \
  --rpc-methods safe \
  --rpc-cors all \
  --allow-private-ipv4

Restart=always
RestartSec=120

[Install]
WantedBy=multi-user.target****

struggling with same issue on coretime, people and bridgehub

related: #4648

Thanks for reporting, I will take a look!

here is an example command that wont work:

docker run \
   parity/polkadot-parachain:1.12.0 \
   --chain people-kusama \
   --relay-chain-rpc-url wss://rpc.ibp.network/kusama

eventually fails, killing the node, with:

2024-05-28 12:05:23 [Parachain] ⚙️  Syncing 391.4 bps, target=#94755 (7 peers), best: #85939 (0x294d…66f1), finalized #0 (0xc1af…8b3f), ⬇ 2.2MiB/s ⬆ 1.3kiB/s    
2024-05-28 12:05:24 [Relaychain] Received imported block via RPC: #23365318 (0x28bd…88f7 -> 0xe6c6…8790)
2024-05-28 12:05:24 [Relaychain] Received imported block via RPC: #23365318 (0x28bd…88f7 -> 0x1d88…d4f9)
2024-05-28 12:05:25 [Relaychain] Overseer exited with error err=Generated(SubsystemStalled("collator-protocol-subsystem", "signal", "polkadot_node_subsystem_types::OverseerSignal"))
2024-05-28 12:05:25 [Relaychain] subsystem exited with error subsystem="network-bridge-rx" err=FromOrigin { origin: "network-bridge", source: SubsystemError(Generated(Context("Signal channel is terminated and empty."))) }
2024-05-28 12:05:25 [Relaychain] subsystem exited with error subsystem="chain-api" err=FromOrigin { origin: "chain-api", source: Generated(Context("Signal channel is terminated and empty.")) }
2024-05-28 12:05:25 [Relaychain] subsystem exited with error subsystem="network-bridge-tx" err=FromOrigin { origin: "network-bridge", source: SubsystemError(Generated(Context("Signal channel is terminated and empty."))) }
2024-05-28 12:05:25 [Relaychain] error receiving message from subsystem context: Generated(Context("Signal channel is terminated and empty.")) err=Generated(Context("Signal channel is terminated and empty."))
2024-05-28 12:05:25 [Relaychain] subsystem exited with error subsystem="availability-recovery" err=FromOrigin { origin: "availability-recovery", source: Generated(Context("Signal channel is terminated and empty.")) }
2024-05-28 12:05:25 [Relaychain] Protocol command streams have been shut down    
2024-05-28 12:05:25 [Relaychain] Essential task `overseer` failed. Shutting down service.    
2024-05-28 12:05:25 [Relaychain] subsystem exited with error subsystem="runtime-api" err=Generated(Context("Signal channel is terminated and empty."))
Error: Service(Other("Essential task failed."))

This is a known issue and was recently fixed: #4167
In general we do not recommend to use public slow RPC nodes for collation. The goal is that you can run multiple collators in your network and point to a full node that you run yourself.

Quick update: I was able to reproduce this issue and have some hints on what might be happening. Will confirm and keep you posted.