illinois-cs241 / broadway-api

This is the old repo for Broadway API. Please see the new repo for newest version of Broadway https://github.com/illinois-cs241/broadway

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Handle False Failure Detections

ayushr2 opened this issue · comments

In an asynchronous system, it is almost impossible to have safety and liveness for failure detection. This can lead to misclassification of nodes being dead.

We currently mark a node as dead if it does not send a heartbeat in 20 seconds. A machine can hang for longer and then continue executing too. So in heartbeat handler, grading job handler and grading result handler, we should check if the request is coming from a dead node. If so mark it as alive again.