microsoft / pai

Resource scheduling and cluster management for AI

Home Page:https://openpai.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Problems about OpenPAI running without Internet

wangxianglang opened this issue · comments

Short summary about the issue/question:
Does the OpenPAI need access to the Internet when running?

Brief what process you are following:
We have a docker image registry server in our LAN and the OpenPAI cluster has access to it. The Internet access of the cluster is cut off for safety consideration. But it seems the job can not start without Internet access.
Here's the Event Message of submitted job:
Failed to pull image "openpai/openpai-runtime:v1.8.0": rpc error: code = Unknown desc = Error response from daemon: Get "https://registry-1.docker.io/v2/": dial tcp: lookup registry-1.docker.io: Temporary failure in name resolution

This is caused by the network because we cut off the Internet access. But the openpai-runtime image is locally available. Why it needs pulling it again from https://registry-1.docker.io/v2/.

How to reproduce it:

OpenPAI Environment:

  • OpenPAI version: 1.8.0
  • OS (e.g. from /etc/os-release): Ubuntu 20.04 LTS

Anything else we need to know:

Runtime will verity job image to make sure it's valid. So it need to call docker registry API to get these info.
If your machine can not access internet, you can setup a private registry, and using this to replace default docker registry.
You can refer to https://openpai.readthedocs.io/en/latest/manual/cluster-user/docker-images-and-job-examples.html#how-to-use-images-from-private-registry