horovod / horovod

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Home Page:http://horovod.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ipv6 address family

NEWPLAN opened this issue · comments

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
I use horovod to launch discributed training tasks, and my cluster has been migirated to ipv6 family. However, the horovod.runner donot support the cluster with throwing errors, including but not limited to:

  1. address parser at "parse_hosts_and_slots@common.utilhosts.py", the function requires ip addresses inputed are int the format of "IP:slot-nums", which conflcts with the ipv6 format.
  2. server listen at "find_port@util.network.py", the addr=("",port) would generate a ipv4 addr.
  3. BasicService@common.util.network.py, fail to support ipv6 family.

Describe the solution you'd like
A clear and concise description of what you want to happen.

current, I just add an environment to force the code execute with ipv6 branch, hoping a better solution from the official.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.
the horovod I used is based on pip3 install with a version=0.28.1