liuruoze / mini-AlphaStar

(JAIR'2022) A mini-scale reproduction code of the AlphaStar program. Note: the original AlphaStar is the AI proposed by DeepMind to play StarCraft II. JAIR = Journal of Artificial Intelligence Research.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CUDA out of memory

khcf123 opened this issue · comments

Hi, im running on windows 10 and using latest starcraft 2. The error shown as below when I run "python run.py"

(torch_1_5) PS C:\Users\alexa\Downloads\mini-AlphaStar-main> python run.py
pygame 2.1.2 (SDL 2.0.18, Python 3.7.11)
Hello from the pygame community. https://www.pygame.org/contribute.html
run init
cudnn available
cudnn version 7604
initialed player
initialed teacher
start_time before training: 2022-01-01 18:11:32
map name: Simple64
player.name: MainPlayer
player.race: Race.protoss
start_time before reset: 2022-01-01 18:13:12
total_episodes: 1
start_episode_time before is_final: 2022-01-01 18:13:13
ActorLoop.run() Exception cause return, Detials of the Exception: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 2.00 GiB total capacity; 1.27 GiB already allocated; 0 bytes free; 1.33 GiB reserved in total by PyTorch)
Traceback (most recent call last):
File "C:\Users\alexa\Downloads\mini-AlphaStar-main\alphastarmini\core\rl\rl_vs_computer_wo_replay.py", line 213, in run
player_step = self.player.agent.step_from_state(state, player_memory)
File "C:\Users\alexa\Downloads\mini-AlphaStar-main\alphastarmini\core\rl\alphastar_agent.py", line 235, in step_from_state
hidden_state=hidden_state)
File "C:\Users\alexa\Downloads\mini-AlphaStar-main\alphastarmini\core\arch\agent.py", line 299, in action_logits_by_state
return_logits = True)
File "C:\Users\alexa\Downloads\mini-AlphaStar-main\alphastarmini\core\arch\arch_model.py", line 134, in forward
entity_embeddings, embedded_entity, entity_nums = self.entity_encoder(state.entity_state)
File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\torch\nn\modules\module.py", line 550, in call
result = self.forward(*input, **kwargs)
File "C:\Users\alexa\Downloads\mini-AlphaStar-main\alphastarmini\core\arch\entity_encoder.py", line 390, in forward
unit_types_one = torch.nonzero(batch, as_tuple=True)[-1]
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 2.00 GiB total capacity; 1.27 GiB already allocated; 0 bytes free; 1.33 GiB reserved in total by PyTorch)

run over

Can i know what is the problem here and what is the solution? Thanks

Yes, this is due to your GPU card memory is not enough.

To fix the problem, try to decrese the number in the line#199 in the

alphastarmini/lib/hyper_parameters.py

MiniStar_Arch_Hyper_Parameters = ArchHyperParameters(batch_size=int(32 * 1.5 / P.Batch_Scale), sequence_length=int(32 * 8 / P.Seq_Scale),

the batch_size and sequence_length can be set to a small number to fit in your GPU card memory (you should also check the value of Batch_Scale and Seq_Scale defined in param.py).

Or you can just use CPU to run the program in the laptop, and switch to GPU when transferring to a server.

To change from GPU to CPU , change the value in the line#2 in the

run.py

USED_DEVICES = "0"

to

USED_DEVICES = "-1"

Hope this will solve your porblem.

Thanks for your reply

My laptop GPU card memory is 2gb.
What batch_size and sequence_length can be set to a small number to fit in my GPU card memory (also the value of Batch_Scale and Seq_Scale defined in param.py)?

Can you please advice me, thanks!!

Yes, I can get it to work after restart laptop, and change USED_DEVICES = "0" to USED_DEVICES = "-1".

But, it comes out another error:

ActorLoop.run() Exception cause return, Detials of the Exception: The game didn't advance to the expected game loop. Expected: 2712, got: 2709
Traceback (most recent call last):
File "C:\Users\alexa\Downloads\mini-AlphaStar-main\alphastarmini\core\rl\rl_vs_computer_wo_replay.py", line 253, in run
timesteps = env.step(env_actions, step_mul=STEP_MUL) # STEP_MUL step_mul
File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\lib\stopwatch.py", line 212, in _stopwatch
return func(*args, **kwargs)
File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 548, in step
return self._step(step_mul)
File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 565, in _step
return self._observe(target_game_loop=target_game_loop)
File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 670, in _observe
self._get_observations(target_game_loop)
File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 645, in _get_observations
"Expected: %s, got: %s") % (target_game_loop, game_loop))
ValueError: The game didn't advance to the expected game loop. Expected: 2712, got: 2709

run over

Yes, I can get it to work after restart laptop, and change USED_DEVICES = "0" to USED_DEVICES = "-1".

But, it comes out another error:

ActorLoop.run() Exception cause return, Detials of the Exception: The game didn't advance to the expected game loop. Expected: 2712, got: 2709 Traceback (most recent call last): File "C:\Users\alexa\Downloads\mini-AlphaStar-main\alphastarmini\core\rl\rl_vs_computer_wo_replay.py", line 253, in run timesteps = env.step(env_actions, step_mul=STEP_MUL) # STEP_MUL step_mul File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\lib\stopwatch.py", line 212, in _stopwatch return func(*args, **kwargs) File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 548, in step return self._step(step_mul) File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 565, in _step return self._observe(target_game_loop=target_game_loop) File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 670, in _observe self._get_observations(target_game_loop) File "C:\Users\alexa\anaconda3\envs\torch_1_5\lib\site-packages\pysc2\env\sc2_env.py", line 645, in _get_observations "Expected: %s, got: %s") % (target_game_loop, game_loop)) ValueError: The game didn't advance to the expected game loop. Expected: 2712, got: 2709

run over

Yes, this is a problem that occasionally happens in windows SC2 (the rate is rare, actually I don't know the reason). However, this problem is not the content of the current issue, which should be discussed in a new issue. Please open a new issue. I will close the current issue for you.