kubernetes-retired / kube-batch

A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consider multiple preemptees of same job in preemptableFn of gang plugin

zionwu opened this issue · comments

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:
In the gang plugin, the preemptableFn does not consider the case that preemptees belong to the same job. It always uses job.ReadyTaskNum() -1 to check if the preemptee are preemptable https://github.com/kubernetes-sigs/kube-batch/blob/master/pkg/scheduler/plugins/gang/gang.go#L77

	for _, preemptee := range preemptees {
		job := ssn.Jobs[preemptee.Job]
		occupid := job.ReadyTaskNum()
		preemptable := job.MinAvailable <= occupid-1 || job.MinAvailable == 1
               .......

However, this is incorrect when preemptees belong to the same job, for example:

  • job A has 5 ready tasks and its minAvailable is 4.
  • 2 tasks of job A are in preemptees.
  • Both tasks use the current logic and are preemptable.
  • After the 2 tasks are preempted, job A no longer satisfy minAvailable.

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.