ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Home Page:https://ray.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ray Cluster: Failed to create a ray cluster using running container

hahmad2008 opened this issue · comments

What happened + What you expected to happen

I am using ray==2.9.2 inside a running container, so I need to create a cluster using the following command:

docker exec -it MY_CONTAINER ray start --head --object-manager-port=8076 --node-manager-port=8077
Then I got message that it successfully created for the head cluster node.
however then when I tried to check the cluster status:

docker exec -it MY_CONTAINER ray status

Traceback (most recent call last):
  File "python/ray/_raylet.pyx", line 3168, in ray._raylet.check_health
  File "python/ray/_raylet.pyx", line 580, in ray._raylet.check_status
ray.exceptions.RpcError: failed to connect to all addresses; last error: UNKNOWN: ipv4:11.1.1.111:6379: Failed to connect to remote host: Connection refused

What is the problem here?

Versions / Dependencies

ray==2.9.2

Reproduction script

I am using ray==2.9.2 inside a running container, so I need to create a cluster using the following command:

docker exec -it MY_CONTAINER ray start --head --object-manager-port=8076 --node-manager-port=8077
Then I got message that it successfully created for the head cluster node.
however then when I tried to check the cluster status:

docker exec -it MY_CONTAINER ray status

Traceback (most recent call last):
  File "python/ray/_raylet.pyx", line 3168, in ray._raylet.check_health
  File "python/ray/_raylet.pyx", line 580, in ray._raylet.check_status
ray.exceptions.RpcError: failed to connect to all addresses; last error: UNKNOWN: ipv4:11.1.1.111:6379: Failed to connect to remote host: Connection refused

What is the problem here?

Issue Severity

High: It blocks me from completing my task.

Kuberay (https://github.com/ray-project/kuberay) is the recommended way to run Ray cluster inside container and k8s. Can you try that?