Qihoo360 / dgl-operator

The DGL Operator makes it easy to run Deep Graph Library (DGL) graph neural network training on Kubernetes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DGL Operator

The DGL Operator makes it easy to run Deep Graph Library (DGL) graph neural network distributed or non-distributed training on Kubernetes. Please check out here for an introduction to DGL and dgl distributed training philosophy.

🛠Prerequisites

  • Kubernetes >= 1.16

🚀Installation

You can deploy the operator with default settings by running the following commands:

git clone https://github.com/Qihoo360/dgl-operator
cd dgl-operator
kubectl create -f deploy/v1alpha1/dgl-operator.yaml

You can check whether the DGL Job custom resource is installed via:

kubectl get crd

The output should include dgljobs.qihoo.net like the following:

NAME                                       AGE
...
dgljobs.qihoo.net                          1m
...

🔬Creating a DGL Job

You can create a DGL job by defining an DGLJob config file. See GraphSAGE.yaml or GraphSAGE_dist.yaml example config file for launching a single-node or multi-node GraphSAGE training job. You may change the config file based on your requirements.

# standalone GraphSAGE
cat examples/v1alpha1/GraphSAGE.yaml
# or a distributed version
cat examples/v1alpha1/GraphSAGE_dist.yaml

Deploy the DGLJob resource to start training:

# standalone GraphSAGE
kubectl create -f examples/v1alpha1/GraphSAGE.yaml
# or a distributed version
kubectl create -f examples/v1alpha1/GraphSAGE_dist.yaml

💭 Reference

Please check out these previous works that helped inspire the creation of DGL Operator

About

The DGL Operator makes it easy to run Deep Graph Library (DGL) graph neural network training on Kubernetes

License:Apache License 2.0


Languages

Language:Go 68.6%Language:Python 14.2%Language:Shell 12.6%Language:Makefile 3.5%Language:Dockerfile 1.2%