Recommended Alphazero training config parameters

Question

Recommended Alphazero training config parameters

robinpdev opened this issue 7 months ago · comments

I'm training the Hex game with Alphazero right now with a 5x5 board size.
My config works but is very slow, taking about 2 days to do 100 steps which gets the model to a sufficient level op play.

I'm training on 2 NVidia V100 GPU's and 12 processor cores with around 90GB of available RAM.

This is the config i'm using:
{ "actors": 4, "checkpoint_freq": 1, "cutoff_probability": 0.800000, "cutoff_value": 0.950000, "devices": "cuda:0,cuda:1,cpu:0,cpu:1,cpu:2,cpu:3,cpu:4,cpu:5,cpu:7,cpu:8,cpu:9,cpu:10", "eval_levels": 3, "evaluation_window": 100, "evaluators": 2, "explicit_learning": true, "game": "rthex", "graph_def": "vpnet.pb", "inference_batch_size": 6, "inference_cache": 262144, "inference_threads": 3, "learning_rate": 0.000100, "max_simulations": 100, "max_steps": 0, "nn_depth": 10, "nn_model": "resnet", "nn_width": 128, "path": "/data/gent/465/vsc46525/shared/robin/rthex_swap_1", "policy_alpha": 1.000000, "policy_epsilon": 0.250000, "replay_buffer_reuse": 3, "replay_buffer_size": 65536, "temperature": 1.000000, "temperature_drop": 10.000000, "train_batch_size": 4096, "uct_c": 2.000000, "weight_decay": 0.000100 }

Is there any way to make more use of my resources available? GPU memory usage is only a few GB (way more is available) and usage is around 40%. Only a few of the CPU cores are used as well.

lanctot · Answer 1 · Mon Dec 11 2023 17:17:42 GMT+0800 (China Standard Time)

Hi,

Which AlphaZero: the python-only TF or C++ Libtorch based one?

@tewalds might have some insights for you. Timo, does this sound right?

Robin Paret · Answer 2 · Mon Dec 11 2023 17:22:06 GMT+0800 (China Standard Time)

I'm using the Libtorch based one because i thought the Tensorflow version is not in a usable state right now.

lanctot · Answer 3 · Mon Dec 11 2023 17:46:51 GMT+0800 (China Standard Time)

Thanks. The Python-only TF AZ should still work, it's just the C++ one that we never got to work externally.

In that case I think the best person to contact would be @christianjans, though I'm not sure if we ever tested this implementation thoroughly with GPUs. You can also try @mrdaliri who did his thesis using AZ on OpenSpiel Hex.

Unfortunately we don't have the time or resources to fully support a larger-scale AlphaZero so the ones in OpenSpiel are meant to be basic / correct example implementations. If you don't get it working, there are some larger-scale ones that can still be run on OpenSpiel games: see RLLib and muzero-general.

lanctot · Answer 4 · Mon Dec 11 2023 17:53:58 GMT+0800 (China Standard Time)

Quick heads-up, I contacted Tom Anthony who did his Ph.D. thesis in Hex and expect this to be faster. Possibly LibTorch is not properly setup to use cuda?

christianjans · Answer 5 · Wed Dec 13 2023 13:47:31 GMT+0800 (China Standard Time)

Hi, sorry for the late response on Libtorch AZ issues. I'm happy to hear your using it!

While working with Libtorch AZ in the past, I did try it with GPUs, but found that there wasn't a huge performance increase (at least with Clobber – the game I was playing).

I added the explicit_learning flag to alpha_zero_torch_example.cc which, when set to true, dedicates one GPU for NN weight updates, and the other GPUs for inference. I found that this sped up training, but I see that you already have this parameter set to true.

If you have 90 GB of memory, I wonder if an increased replay buffer size would be helpful in speeding up the training?

I also did a bit of a parameter writeup with the Python AZ which may be useful? I've attached it to this comment. I believe one of my findings there was that a high replay buffer size to replay buffer reuse ratio sped up the training.

The writeup was done in my early undergrad years so it definitely has its flaws haha. Let me know if you have any questions.

clobber_alphazero_writeup_2021.pdf

Robin Paret · Answer 6 · Fri Dec 15 2023 00:10:09 GMT+0800 (China Standard Time)

Thank you very much for the responses, i'll try these tips out and if i find anything interesting, will respond with my findings.