Cant sync a kusama-people node from scratch with 1.12, working with 1.11, regression?
rvalle opened this issue · comments
Hi!
I cant sync a kusama people parachain node. I am experiencing troubles I have never seem when running all other nodes.
I am using docker distribution: polkadot-parachain:1.12.0, I use the RelayChain RPC interface.
I vanilla start with docker run --rm parity/polkadot-parachain:1.12.0 --chain people-kusama
complaints about most (or all, not sure) nodes having a genesis mistmatch:
2024-05-28 09:37:31 [Parachain] Report 12D3KooWS7uzh62LChjfbyYGj1U5yGYaKNWMzzh6AAWHiJ5aLYLH: -2147483648 to -2147483648. Reason: Genesis mismatch. Banned, disconnecting.
I then restrict to boot nodes only using flags reserved-only
and the boot nodes as reserved nodes, and then I get parachain blocks.... however no finalizations,
eventually I reach the top of the chain, also using the RelayChain RPC interface:
2024-05-28 09:40:46 [Parachain] ⚙️ Preparing 0.0 bps, target=#94115 (1 peers), best: #94015 (0x2dee…a9c6), finalized #0 (0xc1af…8b3f), ⬇ 12 B/s ⬆ 26 B/s
but no blocks appear to be finalized,
and eventually I get this other warning constantly:
2024-05-28 09:40:42 [Parachain] Event distribution channel has reached its limit. This can lead to missed notifications. error=TrySendError { kind: Full }
I am not sure what is going on. I am using the new default --prune archive-canonical
and tried also the different sync modes (fast, warp) but nothing seems to make a difference.
I have also tried to use the polkadot-collator image but does not seem to have the kusama people spec.
What am I missing?
here is an example command that wont work:
docker run \
parity/polkadot-parachain:1.12.0 \
--chain people-kusama \
--relay-chain-rpc-url wss://rpc.ibp.network/kusama
eventually fails, killing the node, with:
2024-05-28 12:05:23 [Parachain] ⚙️ Syncing 391.4 bps, target=#94755 (7 peers), best: #85939 (0x294d…66f1), finalized #0 (0xc1af…8b3f), ⬇ 2.2MiB/s ⬆ 1.3kiB/s
2024-05-28 12:05:24 [Relaychain] Received imported block via RPC: #23365318 (0x28bd…88f7 -> 0xe6c6…8790)
2024-05-28 12:05:24 [Relaychain] Received imported block via RPC: #23365318 (0x28bd…88f7 -> 0x1d88…d4f9)
2024-05-28 12:05:25 [Relaychain] Overseer exited with error err=Generated(SubsystemStalled("collator-protocol-subsystem", "signal", "polkadot_node_subsystem_types::OverseerSignal"))
2024-05-28 12:05:25 [Relaychain] subsystem exited with error subsystem="network-bridge-rx" err=FromOrigin { origin: "network-bridge", source: SubsystemError(Generated(Context("Signal channel is terminated and empty."))) }
2024-05-28 12:05:25 [Relaychain] subsystem exited with error subsystem="chain-api" err=FromOrigin { origin: "chain-api", source: Generated(Context("Signal channel is terminated and empty.")) }
2024-05-28 12:05:25 [Relaychain] subsystem exited with error subsystem="network-bridge-tx" err=FromOrigin { origin: "network-bridge", source: SubsystemError(Generated(Context("Signal channel is terminated and empty."))) }
2024-05-28 12:05:25 [Relaychain] error receiving message from subsystem context: Generated(Context("Signal channel is terminated and empty.")) err=Generated(Context("Signal channel is terminated and empty."))
2024-05-28 12:05:25 [Relaychain] subsystem exited with error subsystem="availability-recovery" err=FromOrigin { origin: "availability-recovery", source: Generated(Context("Signal channel is terminated and empty.")) }
2024-05-28 12:05:25 [Relaychain] Protocol command streams have been shut down
2024-05-28 12:05:25 [Relaychain] Essential task `overseer` failed. Shutting down service.
2024-05-28 12:05:25 [Relaychain] subsystem exited with error subsystem="runtime-api" err=Generated(Context("Signal channel is terminated and empty."))
Error: Service(Other("Essential task failed."))
here is another example running with the released binary in this repository:
./polkadot-parachain --chain people-kusama --relay-chain-rpc-url wss://rpc.ibp.network/kusama
which reports version: version 1.12.0-b4016902ac7
similar behaviour....
However, if I use the 1.11.0 release, and the parachain spec from the repo, here, as Paranodes remembers their initial sync, then it seems to work:
2024-05-28 14:19:48 [Relaychain] Received imported block via RPC: #23365460 (0x8b8d…8d67 -> 0xd6ae…6073)
2024-05-28 14:19:48 [Parachain] ♻️ Reorg on #94819,0xd53d…1cc7 to #94819,0x9610…7b2d, common ancestor #94818,0x6fbd…faee
2024-05-28 14:19:50 [Parachain] 💤 Idle (7 peers), best: #94819 (0x9610…7b2d), finalized #94817 (0x9efd…10ab), ⬇ 6.8kiB/s ⬆ 4.1kiB/s
2024-05-28 14:19:51 [Relaychain] Received finalized block via RPC: #23365457 (0x09f5…8239 -> 0x951c…cb3c)
is perhaps the initial sync broken in the latest release?
root@kppl27:/opt/cumulus# cat cumulus.service
[Unit]
Description="kppl27 endpoint - Cumulus service"
After=network-online.target
Wants=network-online.target
[Service]
User=cumulus
Group=cumulus
ExecStart=/opt/cumulus/cumulus \
--name "Rotko Networks - kppl27 Endpoint" \
--chain /opt/cumulus/people-kusama.json \
--base-path /opt/cumulus \
--state-pruning archive \
--blocks-pruning=archive \
--database paritydb \
--sync full \
--listen-addr /ip4/0.0.0.0/tcp/33857 \
--listen-addr /ip4/0.0.0.0/tcp/34857/ws \
--public-addr /ip4/27.131.160.106/tcp/33857 \
--public-addr /ip4/27.131.160.106/tcp/34857/ws \
--public-addr /dns/kppl27.rotko.net/tcp/33857 \
--public-addr /dns/kppl27.rotko.net/tcp/34857/ws \
--public-addr /dns/kppl27.rotko.net/tcp/35857/wss \
--rpc-port 9857 \
--prometheus-port 7857 \
--prometheus-external \
--relay-chain-rpc-urls ws://192.168.69.24:9324 \
--wasm-execution Compiled \
--no-hardware-benchmarks \
--max-runtime-instances 32 \
--rpc-max-request-size 16 \
--rpc-max-response-size 16 \
--rpc-max-subscriptions-per-connection 512 \
--rpc-max-connections 10000 \
--rpc-external \
--rpc-methods safe \
--rpc-cors all \
--allow-private-ipv4
Restart=always
RestartSec=120
[Install]
WantedBy=multi-user.target****
struggling with same issue on coretime, people and bridgehub
related: #4648
Thanks for reporting, I will take a look!
here is an example command that wont work:
docker run \ parity/polkadot-parachain:1.12.0 \ --chain people-kusama \ --relay-chain-rpc-url wss://rpc.ibp.network/kusama
eventually fails, killing the node, with:
2024-05-28 12:05:23 [Parachain] ⚙️ Syncing 391.4 bps, target=#94755 (7 peers), best: #85939 (0x294d…66f1), finalized #0 (0xc1af…8b3f), ⬇ 2.2MiB/s ⬆ 1.3kiB/s 2024-05-28 12:05:24 [Relaychain] Received imported block via RPC: #23365318 (0x28bd…88f7 -> 0xe6c6…8790) 2024-05-28 12:05:24 [Relaychain] Received imported block via RPC: #23365318 (0x28bd…88f7 -> 0x1d88…d4f9) 2024-05-28 12:05:25 [Relaychain] Overseer exited with error err=Generated(SubsystemStalled("collator-protocol-subsystem", "signal", "polkadot_node_subsystem_types::OverseerSignal")) 2024-05-28 12:05:25 [Relaychain] subsystem exited with error subsystem="network-bridge-rx" err=FromOrigin { origin: "network-bridge", source: SubsystemError(Generated(Context("Signal channel is terminated and empty."))) } 2024-05-28 12:05:25 [Relaychain] subsystem exited with error subsystem="chain-api" err=FromOrigin { origin: "chain-api", source: Generated(Context("Signal channel is terminated and empty.")) } 2024-05-28 12:05:25 [Relaychain] subsystem exited with error subsystem="network-bridge-tx" err=FromOrigin { origin: "network-bridge", source: SubsystemError(Generated(Context("Signal channel is terminated and empty."))) } 2024-05-28 12:05:25 [Relaychain] error receiving message from subsystem context: Generated(Context("Signal channel is terminated and empty.")) err=Generated(Context("Signal channel is terminated and empty.")) 2024-05-28 12:05:25 [Relaychain] subsystem exited with error subsystem="availability-recovery" err=FromOrigin { origin: "availability-recovery", source: Generated(Context("Signal channel is terminated and empty.")) } 2024-05-28 12:05:25 [Relaychain] Protocol command streams have been shut down 2024-05-28 12:05:25 [Relaychain] Essential task `overseer` failed. Shutting down service. 2024-05-28 12:05:25 [Relaychain] subsystem exited with error subsystem="runtime-api" err=Generated(Context("Signal channel is terminated and empty.")) Error: Service(Other("Essential task failed."))
This is a known issue and was recently fixed: #4167
In general we do not recommend to use public slow RPC nodes for collation. The goal is that you can run multiple collators in your network and point to a full node that you run yourself.
Quick update: I was able to reproduce this issue and have some hints on what might be happening. Will confirm and keep you posted.