efficient / epaxos

Home Page:http://efficient.github.io/epaxos/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error in the replica message processing in mencius.go

PasinduTennage opened this issue · comments

There is an error in design in the mencius message processing logic.

`select {

	case propose := <-r.ProposeChan:
		//got a Propose from a client
		dlog.Printf("Proposal with id %d\n", propose.CommandId)
		r.handlePropose(propose)
		break

	case skipS := <-r.skipChan:
		skip := skipS.(*menciusproto.Skip)
		//got a Skip from another replica
		dlog.Printf("Skip for instances %d-%d\n", skip.StartInstance, skip.EndInstance)
		r.handleSkip(skip)

	case prepareS := <-r.prepareChan:
		prepare := prepareS.(*menciusproto.Prepare)
		//got a Prepare message
		dlog.Printf("Received Prepare from replica %d, for instance %d\n", prepare.LeaderId, prepare.Instance)
		r.handlePrepare(prepare)
		break

	case acceptS := <-r.acceptChan:
		accept := acceptS.(*menciusproto.Accept)
		//got an Accept message
		dlog.Printf("Received Accept from replica %d, for instance %d\n", accept.LeaderId, accept.Instance)
		r.handleAccept(accept)
		break

	case commitS := <-r.commitChan:
		commit := commitS.(*menciusproto.Commit)
		//got a Commit message
		dlog.Printf("Received Commit from replica %d, for instance %d\n", commit.LeaderId, commit.Instance)
		r.handleCommit(commit)
		break

	case prepareReplyS := <-r.prepareReplyChan:
		prepareReply := prepareReplyS.(*menciusproto.PrepareReply)
		//got a Prepare reply
		dlog.Printf("Received PrepareReply for instance %d\n", prepareReply.Instance)
		r.handlePrepareReply(prepareReply)
		break

	case acceptReplyS := <-r.acceptReplyChan:
		acceptReply := acceptReplyS.(*menciusproto.AcceptReply)
		//got an Accept reply
		dlog.Printf("Received AcceptReply for instance %d\n", acceptReply.Instance)
		r.handleAcceptReply(acceptReply)
		break`

In Mencius, each node should have FIFO channels, which is correctly implemented in this implementation. However, upon receiving a message from a node, that message is pushed to a channel that is specific to that message type. Then the messages are processed in the receiver side in non-FIFO method. The following is an example where this design approach breaks safety.

Assume that there are 3 nodes; A, B and C. Node A first sends a Accept message and then later sends a Propose message. Now both these messages are received by B in the order sent by A. However, upon receiving the two messages, Node B will push these messages to two separate queues. Another thread scans each channel using a select polling mechanism.

Now there is a violation of the protocol if the Propose message is first processed by B (which is possible in this design). This is a problem in mencius because, from messages each node derives piggy backed messages, hence the order of processing messages should be strictly similar to the sender's order.

A fix for this would be to have a single channel for each type of replica messages.

Thanks