slot-based-collator: improve relay chain fork selection
sandreim opened this issue · comments
Long duration testing has revealed that some times we are using a relay parent on a fork that will be discarded later on by the relay chain. This happens because we always pick the relay parent as soon as we import RCB, but due to network latencies the best RCB might actually arrive after that. This results in missing all of the 3 slots.
Offline discussions revealed some options to improve the situation with some tradeoffs:
- use relay parent at
best RCB - 1
but tradeoff 6s of extra delay in XCM DMP/HRMP messages processing - offset parachain block production by 1-2s after RCB slot (this seemed to not actually work in tests, but no deep dive was done)
- build on all forks - tradeoff multiplied load of relay chain validators and collator by amount of forks
On the testnet this didn't seem like a big problem as these block time spikes were rare, but on main net the issue could become more problematic.
Afaik the first two sound fine. And the second sounds wise anyways.
this seemed to not actually work in tests, but no deep dive was done
This could be various things, but maybe just some timing things not set everywhere?
tradeoff multiplied load of relay chain validators and collator by amount of forks
We changed how collators learn of parachain blocks, right? This could magnify somewhat if they learned from other collators. If they learn from the realy chain then everything is fine.