Add a `Terminate` message to ABA?

Question

Add a `Terminate` message to ABA?

afck opened this issue 6 years ago · comments

I think our current Agreement implementation is not guaranteed to terminate: It will eventually output, but after that it could hang.

Let's say the f faulty nodes just never send any messages, one node, Alice, decides and outputs 0️⃣ in round 1, and the coin shows 0️⃣ in round 2 again. Then Alice will terminate in round 2 and honest Bob will output in round 2. But Bob must continue until the coin comes up 0️⃣ again. He will be stuck in round 3, where he won't receive more than N - f - 1 Aux messages.

Am I missing something? But even if I am, it would be nice to be able to terminate immediately when we output.

We should add a new message type AgreementContent::Terminate(bool), which counts as Aux, BVal and Conf for all following rounds. Also, if a node receives f + 1 of them, it can immediately terminate, too.

Mostéfaoui, Moumen and Raynal wrote a follow-up paper where they deal with termination. There are some other modifications to ABA in there which we should also have a closer look at. Also: How do the Python, Erlang and Go implementations handle the above? If this is indeed a problem, we should file bugs there, too.

Andrew Miller · Answer 1 · Thu Jun 07 2018 20:43:18 GMT+0800 (China Standard Time)

HoneyBadgerBFT-Python currently exhibits this "hang forever after giving output" feature (under failure cases).

It does not seem clear to me that just waiting for f+1 of the terminate(v) messages is a safe condition for termination. It could be that f of those come from faulty nodes. And failing to send more coin values means that other honest nodes may get stuck. It's also possible this is safe after all, if even one honest node decides in a round then at least f other honest nodes will also decide in the same round?

I'm thinking this optimistic approach amiller/HoneyBadgerBFT#63 seems to permit terminating quickly though after sending the terminate(bool) message and one additional coin value.

I have not looked into the follow-up paper, very curious about it!

Andreas Fackler · Answer 2 · Thu Jun 07 2018 20:55:01 GMT+0800 (China Standard Time)

Thank you for the quick reply! Yes, I'm hoping terminating on f + 1 Terminates is safe:

The Terminate(v) will count as BVal(v) and Aux(v) and Conf({v}), so for those messages it won't be a problem anyway. For the coin, it should work, too: If t honest nodes have terminated, either:

t >= f + 1, so the other honest nodes will all receive f + 1 Terminate(v)s and terminate, too, or
t <= f, so there are still N - t - f >= N - 2 * f >= f + 1 running honest nodes and therefore f + 1 coin shares.

(But I think you're right and one of the cases isn't even possible.)

I also agree with the proposed optimizations (fixed 0 and 1 rounds), but the Terminate message should work with or without them.

Vladimir Komendantskiy · Answer 3 · Fri Jun 08 2018 00:28:29 GMT+0800 (China Standard Time)

I think that expedite termination is an excellent improvement. Terminating after receiving f + 1 Terminate(v) -- possibly from deferent rounds of Agreement -- is safe because at least one of those messages was sent by an honest node, and that node should have either received a Terminate(v) from another honest node or reached termination by the proven to be correct process of waiting for an epoch with a repeating coin value.

Ethan MacBrough · Answer 4 · Fri Jun 08 2018 00:28:38 GMT+0800 (China Standard Time)

I agree this is necessary if the design is set up so that output and termination are coupled. Since Moustefaoi et al.'s algorithm can't provide any agreement on a certain round number to terminate at, there's always the chance that all but one node honest terminates and then that honest node will be left worrying if some other honest node hasn't terminated, but can't progress far enough to ensure this.

I didn't believe it at first, but I also agree that this solution works. My original suggestion for the terminate message was similar but instead of terminating after f+1 terminate messages, you send your own terminate message, and only terminate after receiving n-f terminate messages. But your point is that you don't need to wait for the extra terminate messages because after everyone has come to agreement on v you only need f+1 honest nodes to participate. Saves an extra message round too :-)

EDIT: Also it definitely is possible for just one honest node to output in some round, even with no faulty nodes. If half of the nodes vote 0 and the other half vote 1, then it's possible for 2f+1 nodes to AUX 0 while f AUX 1. Then one node can see the AUX(0) messages first while all the others see a mix and get bin_values={0,1}, and the coin can land on 0 allowing only that one to terminate. By very similar logic it's also possible for only a single node to not output in some round.