tensorflow / agents

TF-Agents: A reliable, scalable and easy to use TensorFlow library for Contextual Bandits and Reinforcement Learning.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error in Parallel environment processing BrokenPipeError:[WinError 109]

roeslib opened this issue · comments

Could someone please help me? I am training my PPO model with 128 parallel environments and at the step number 2340992 comes this error that stops the execution of the script. I tried to reduce the number of parallel environments but the error persists.

Traceback (most recent call last):
File "C:\Users\Libia\anaconda3\envs\rlenvironment\lib\multiprocessing\connection.py", line 301, in _recv_bytes
ov, err = _winapi.ReadFile(self._handle, bsize,
BrokenPipeError: [WinError 109] Ha terminado la canalización

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 17, in
train_model(args, writer,run_name)
File "C:\Users\Libia\anaconda3\envs\rlenvironment\PPO_3DCONV2DCONV\modeltraining.py", line 102, in train_model
next_obs, reward, done, info = envs.step(action.cpu().numpy())
File "c:\users\libia\anaconda3\envs\rlenvironment\baselines\baselines\common\vec_env\vec_env.py", line 108, in step
return self.step_wait()
File "C:\Users\Libia\anaconda3\envs\rlenvironment\PPO_3DCONV2DCONV\vectorizedenvs_test.py", line 125, in step_wait
obs, reward, done, info = self.venv.step_wait()
File "c:\users\libia\anaconda3\envs\rlenvironment\baselines\baselines\common\vec_env\vec_normalize.py", line 27, in step_wait
obs, rews, news, infos = self.venv.step_wait()
File "c:\users\libia\anaconda3\envs\rlenvironment\baselines\baselines\common\vec_env\shmem_vec_env.py", line 76, in step_wait
outs = [pipe.recv() for pipe in self.parent_pipes]
File "c:\users\libia\anaconda3\envs\rlenvironment\baselines\baselines\common\vec_env\shmem_vec_env.py", line 76, in
outs = [pipe.recv() for pipe in self.parent_pipes]
File "C:\Users\Libia\anaconda3\envs\rlenvironment\lib\multiprocessing\connection.py", line 250, in recv
buf = self._recv_bytes()
File "C:\Users\Libia\anaconda3\envs\rlenvironment\lib\multiprocessing\connection.py", line 321, in _recv_bytes
raise EOFError
EOFError

You are not going to like this answer and I am sorry. We do not really support windows. We moved to using Reverb as our ReplayBuffer and we only compile it for Linux. No one on the core team uses windows.

I think you might have the wrong project. This is tf-agents not baselines. Baselines is another project for RL. I don't use anaconda so I am not totally sue what you have installed. I am not sure what rlenviorment is based on 10 seconds of searching; but baselines is something I am familiar so that is my best guess. Those paths are also not something I am familiar with in this code base. I could be wrong; but that is my off-the-cuff.

Thank you for your comment, I solved my problem, it was a programming error with heritances of the class Monitor baselines.bench.Monitor. I solved my error and let the agent training and as result it could reach 10M steps. Currently I am working in Windows but I am moving to Ubuntu to use other RL frameworks.

You have reason, I made a mistake, this is the wrong project Questions and Answers Forum I should have choosen baselines. Should I delete my post? How can I do it?