alibaba / EasyRec

A framework for large scale recommendation algorithms.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

分布式multiworker issue

cici-tan opened this issue · comments

I got errors when running train_multi_worker.sh:

INFO:tensorflow:TF_CONFIG environment variable: {'cluster': {'worker': ['localhost:2224', 'localhost:2223']}, 'task': {'type': 'chief', 'index': 0}}
I0607 18:58:55.754451 4508605952 run_config.py:535] TF_CONFIG environment variable: {'cluster': {'worker': ['localhost:2224', 'localhost:2223']}, 'task': {'type': 'chief', 'index': 0}}
Traceback (most recent call last):
File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
tf.app.run()
File "/usr/local/lib/python3.7/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.7/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/local/lib/python3.7/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "/usr/local/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/run_config.py", line 569, in init
self._init_distributed_setting_from_environment_var(tf_config)
File "/usr/local/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/run_config.py", line 630, in _init_distributed_setting_from_environment_var
self._cluster_spec, task_env, TaskType.CHIEF)
File "/usr/local/lib/python3.7/site-packages/tensorflow_estimator/python/estimator/run_config.py", line 161, in _validate_task_type_and_task_id
chief_task_type)
ValueError: If "cluster" is set in TF_CONFIG, it must have one "chief" node.
easy_rec version: 0.4.6

what is the tensorflow version?

what is the tensorflow version?

2.1.0

问题还有吗?

if you still have questions, please reopen it later.