heluocs / mpi-operator

Repository for the MPI operator.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MPI Operator

The MPI Operator makes it easy to run allreduce-style distributed training.

Deploy

kubectl create -f deploy/

Test

Launch a multi-node tensorflow benchmark training job:

kubectl create -f examples/tensorflow-benchmarks.yaml

Once everything starts, the logs are available in the launcher pod.

About

Repository for the MPI operator.

License:Apache License 2.0


Languages

Language:Go 96.5%Language:Shell 2.5%Language:Dockerfile 1.1%