DataDog / kafka-kit

Kafka storage rebalancing, automated replication throttle, cluster API and more

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[topicmappr] Optionally relax rack constraint to best effort

davidrusu opened this issue · comments

Sometimes you don't have 3 racks available, but we still want 3 replicas that are spread out across available racks as much as possible.

How would you feel about introducing an option to relax the rack safety constraint.

This could be doable; I could see a flag that indicates something along the lines of "minimum unique rack.ids" where it defaults to "all" (the current behavior) or could take a value like 2 to satisfy the case you're describing. I'll review this and play with some ideas.

@jamiealquiza I've got some time today at work to play with this, I can experiment to see if a useful approach comes to me

Some thoughts as I'm working.

I want to refactor the Constraint struct to look like this:

type Constraints struct {
	diskSpace             float64        // a broker must have at least this much available space to be eligible
	remainingSpotsPerRack map[string]int // a broker must have a spot left in it's rack to be eligible
	allocatedBrokers      map[int]bool   // a broker must not be in this list to be eligible
}

remainingSpotsPerRack is a map from a rack to how many brokers we want in this rack.

The current behavior would be to first we choose len(p.Replicas) racks to each constraint, then we set the c.remainingSpotsPerRack[rack] = 1 for each rack we chose for this replica.

the new behavior would add the needed spots to racks evenly across all constraints.

Now c.passes(b) will check that c.remainingSpotsPerRack[b.rack] > 0 and c.add(b) will do c.remainintSpotsPerRack[b.rack] -= 1

its a bit of a change 🤔

More thinking, we currently rebuild constraints on each pass through the placement strategy. I would like to see us build a list of constraints up front and then loop through, adding brokers, until all constraints are satisfied. (not entirely a fully formed thought yet, but I'll continue experimenting)

its a bit of a change

Hah, yes, the constraints stuff in general is one of those areas I quickly built up to do a particular thing but has been on my radar to cleanup (so we can more easily make changes like this)

👋Playing with some ideas and rewriting bits of the constraints stuff today, I'll share a branch when I have something working

Got a few things done on this so far in this branch:

  • Adds a new method to Constraints called SelectBroker which takes a ConstraintsParams type, which includes a MinUniqueRackIDs field. This new method will replace the previous BestCandidate.
  • Setting MinUniqueRackIDs to 0 retains the current behavior (all rack IDs must be unique). Any config >0 will prevent new brokers from failing the constraints check as long as the replica set has hit whatever value was configured.
  • The rebuild functions now use the new SelectBroker method when filling replica set slots.

The last functional todo is now just to add a flag to topicmappr that allows the user to control the MinUniqueRackIDs param. I'll add this later tonight and get testing rolling on this. Should have this cleaned up and ready to try out tomorrow.