Project crashes after a few runs

Question

Project crashes after a few runs

Muramas opened this issue 7 months ago · comments

I had this issue before but I did a fresh clone and it seems to still be happening. It can run a few cycles before it happens but this is what happens when it fails.

step: 16000 event: 8.00 level: 4.00 heal: 0.00 op_lvl: 0.00 dead: -0.40 badge: 0.00 explore: 37.08 sum: 48.68Traceback (most recent call last):
File "C:\Users<user>\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\process.py", line 314, in _bootstrap
self.run()
File "C:\Users<user>\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "C:\Users<user>\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\vec_env\subproc_vec_env.py", line 35, in _worker
observation, reward, terminated, truncated, info = env.step(data)
^^^^^^^^^^^^^^
File "H:\PokemonAI\PokemonRedExperiments\baselines\red_gym_env.py", line 224, in step
self.save_and_print_info(step_limit_reached, obs_memory)
File "H:\PokemonAI\PokemonRedExperiments\baselines\red_gym_env.py", line 401, in save_and_print_info
plt.imsave(
File "C:\Users<user>\AppData\Local\Programs\Python\Python311\Lib\site-packages\matplotlib\pyplot.py", line 2200, in imsave
return matplotlib.image.imsave(fname, arr, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users<user>\AppData\Local\Programs\Python\Python311\Lib\site-packages\matplotlib\image.py", line 1689, in imsave
image.save(fname, **pil_kwargs)
File "C:\Users<user>\AppData\Local\Programs\Python\Python311\Lib\site-packages\PIL\Image.py", line 2429, in save
fp = builtins.open(filename, "w+b")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [Errno 22] Invalid argument: 'session_30e6d259\curframe_a4836dd0.jpeg'
304229 pyboy.pyboy INFO ###########################
304229 pyboy.pyboy INFO # Emulator is turning off #
304230 pyboy.pyboy INFO ###########################
Traceback (most recent call last):
File "C:\Users<user>\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\connection.py", line 311, in _recv_bytes
nread, err = ov.GetOverlappedResult(True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
BrokenPipeError: [WinError 109] The pipe has been ended

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "H:\PokemonAI\PokemonRedExperiments\baselines\run_baseline_parallel_fast.py", line 83, in
model.learn(total_timesteps=(ep_length)num_cpu1000, callback=CallbackList(callbacks))
File "C:\Users<user>\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\ppo\ppo.py", line 308, in learn
return super().learn(
^^^^^^^^^^^^^^
File "C:\Users<user>\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 259, in learn
continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users<user>\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\on_policy_algorithm.py", line 178, in collect_rollouts
new_obs, rewards, dones, infos = env.step(clipped_actions)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users<user>\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\vec_env\base_vec_env.py", line 197, in step
return self.step_wait()
^^^^^^^^^^^^^^^^
File "C:\Users<user>\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\vec_env\vec_transpose.py", line 95, in step_wait
observations, rewards, dones, infos = self.venv.step_wait()
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users<user>\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\vec_env\subproc_vec_env.py", line 130, in step_wait
results = [remote.recv() for remote in self.remotes]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users<user>\AppData\Local\Programs\Python\Python311\Lib\site-packages\stable_baselines3\common\vec_env\subproc_vec_env.py", line 130, in
results = [remote.recv() for remote in self.remotes]
^^^^^^^^^^^^^
File "C:\Users<user>\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\connection.py", line 249, in recv
buf = self._recv_bytes()
^^^^^^^^^^^^^^^^^^
File "C:\Users<user>\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\connection.py", line 320, in _recv_bytes
raise EOFError
EOFError
wandb: Waiting for W&B process to finish... (failed 1). Press Ctrl-C to abort syncing.

Rousk · Answer 1 · Sat Oct 28 2023 00:05:10 GMT+0800 (China Standard Time)

37a6e8e

That's a temp fix, I think puffertank version fixed this. I'll need to see how they did it.

Also may want to fix typo on error messages xD