Support gang-scheduling by kube-batch
k82cn opened this issue · comments
Klaus Ma commented
Gang-scheduling is a common requirement from training job; kube-batch supports it right now :) So open this issue to trace the discussion.
Klaus Ma commented
/kind feature
Klaus Ma commented
@johnugeorge , what's the plan of 0.5?
Johnu George commented
@k82cn
Currently, gang scheduling behavior is consistent across TF and Pytorch operators.
Are you working on kubeflow/training-operator#916? This is planned for 0.5 release across operators.
Klaus Ma commented
regarding kubeflow/training-operator#916 , replace PDB with PodGroup is done; I'm thinking how to add some other advanced feature from kube-batch by PodGroup :). I'll do some investigation after 0.5 :)