rabbitmq / ra

A Raft implementation for Erlang and Elixir that strives to be efficient and make it easier to use multiple Raft clusters in a single system.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ra_log_snapshot fails CRC check on OTP 26

kjnilsson opened this issue · comments

This is due to the fact that the snapshot meta data map isn't replicated as binary data but instead re-serialised on the receiver which means it may not calculate the same checksum over that data (as term_to_binary map representation isn't deterministic between OTP versions). As the snapshot replication includes binary data from files we really ought to try to perform checksum validation if at all possible.

Options:

  1. Disable checksum validation completely for ra_log_snapshot (this won't work when sending a snapshot to an old member that still performs the validation).
  2. Do 1. but create a new snapshot implementation that does the right thing. This would require careful management of snapshot module configuration as it isn't possible to have Ra clusters with different snapshot implementations.
  3. Replicate the entire snapshot file so that it includes the original binary representation of the meta data (and thus the CRC check isn't subject to OTP variations in the term_to_binary output).
    • to do this the sender would need to discover dynamically whether the receiver has the new code that can handle the entire file contents rather than just the machine data portion. We already perform an rpc to discover machine version compatibility so we can include a check for this also.
    • At the receiving end the new implementation could pattern match on the first data chunk received to see if it includes the magic value and version (<<"RASN", Version:32/integer>>). If so the sender also has the new code and all is well. If not included the sender has the old logic and the receiver falls back to the old behaviour of serialising the meta data map but without performing checksum validation. Instead we have to assume the data is fine and calculate and write a new checksum value into the snapshot file.
    • This approach still has the shortcoming that a snapshot sent to a member running the old code may still fail due to term_to_binary output differences but this can't completely be avoided.

To do this well the ra_snapshot behaviour API could do with some modifications. ra_snapshot isn't formally part of the public API of Ra so we should be allowed to make modifications without breaking anyones code. The only use on GH I found is in mnevis which is an abandoned project of our own control.

Option 3 seems promising, but is it future-proof in case term_to_binary() changes again?

I think it is — a snapshot file is generated once and for all as it contains the result of commands which were or will be dropped from the WAL in the given term — but I'm not confident enough in my deep understanding of Ra.