PaddlePaddle / PARL

A high-performance distributed training framework for Reinforcement Learning

Home Page:https://parl.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A2C模型训练报错

USTCKAY opened this issue · comments

使用的PARL代码为最新develop分支,paddle版本是2.5.0,GPU、CPU都试过了,均出现相同错误,具体报错信息如下:
`[07-25 00:36:45 MainThread @client.py:211] WRN [Client] Can not connect to the master, please check if master is started and ensure the input address localhost:8110 is correct.
Traceback (most recent call last):
File "/workspace/PARL/parl/remote/client.py", line 206, in _create_sockets
message = self.submit_job_socket.recv_multipart()
File "/opt/py37env/lib/python3.7/site-packages/zmq/sugar/socket.py", line 475, in recv_multipart
parts = [self.recv(flags, copy=copy, track=track)]
File "zmq/backend/cython/socket.pyx", line 791, in zmq.backend.cython.socket.Socket.recv
File "zmq/backend/cython/socket.pyx", line 827, in zmq.backend.cython.socket.Socket.recv
File "zmq/backend/cython/socket.pyx", line 191, in zmq.backend.cython.socket._recv_copy
File "zmq/backend/cython/socket.pyx", line 186, in zmq.backend.cython.socket._recv_copy
File "zmq/backend/cython/checkrc.pxd", line 19, in zmq.backend.cython.checkrc._check_rc
zmq.error.Again: Resource temporarily unavailable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 195, in
learner = Learner(config)
File "train.py", line 63, in init
self.create_actors()
File "train.py", line 68, in create_actors
parl.connect(self.config['master_address'])
File "/workspace/PARL/parl/remote/client.py", line 430, in connect
GLOBAL_CLIENT = Client(master_address, cur_process_id, distributed_files)
File "/workspace/PARL/parl/remote/client.py", line 73, in init
self._create_sockets(master_address)
File "/workspace/PARL/parl/remote/client.py", line 215, in _create_sockets
"address {} is correct.".format(master_address))
Exception: Client can not connect to the master, please check if master is started and ensure the input address localhost:8110 is correct.`
麻烦帮忙看一下,多谢!

A2C需要开启xparl训练