kubeflow / pytorch-operator

PyTorch on Kubernetes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support gang-scheduling by kube-batch

k82cn opened this issue · comments

Gang-scheduling is a common requirement from training job; kube-batch supports it right now :) So open this issue to trace the discussion.

/kind feature

@johnugeorge , what's the plan of 0.5?

@k82cn
Currently, gang scheduling behavior is consistent across TF and Pytorch operators.

Are you working on kubeflow/training-operator#916? This is planned for 0.5 release across operators.

regarding kubeflow/training-operator#916 , replace PDB with PodGroup is done; I'm thinking how to add some other advanced feature from kube-batch by PodGroup :). I'll do some investigation after 0.5 :)