m3db / m3

M3 monorepo - Distributed TSDB, Aggregator and Query Engine, Prometheus Sidecar, Graphite Compatible, Metrics Platform

Home Page:https://m3db.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[aggregator] All nodes should have `CanLead == true` before the first ever flush

justinjc opened this issue · comments

When a new m3aggregator instance is created, all nodes should have CanLead == true (here, a new instance is defined as a cluster(s) in a new environment with its own flush times in KV).

In general, an m3aggregator node CanLead when we know that it has accumulated all data since the last flush time T. We know the data before T is safe, so if we have all data since T, we have all the data required to lead the cluster.

What if a "last flush time" doesn't exist? When a new instance gets created, there hasn't been any flush yet (no flush times have been persisted to the KV store). Today, when we're in this scenario, the follower is deemed CanLead == false. However, this is quite arbitrary. When a cluster is first created, all nodes are campaigning for election and have the possibility of being the leader (the first node to come up does not necessarily become the leader, e.g. due to randomized election timeouts from raft). Once a leader is elected, all other nodes become followers and have CanLead == false, despite there being no changes to the amount of data they have accumulated. Since any of the followers could have potentially been the leader and lost the election possibly just due to random chance, they should remain as CanLead == true until the first flush has been persisted to KV.