Success report and request for help

Question

Success report and request for help

bwanab opened this issue a year ago · comments

As I mentioned in another issue, I've been working on training an AI Agent to play Othello/reversi. I wanted to report that I've had some pretty decent success using AlphaZero.jl. Much more than I was able to achieve with PYTorch, TensorFlow or Flux.jl. That's the good news. The not-so-good news is that while I've gotten a relatively good player, it's still not that great. It easily beats really bad players (like me) and can play 50/50 against a basic MinMax heuristic (translated from https://github.com/sadeqsheikhi/reversi_python_ai).

In my training, I've done around 25 iterations (the repository is here: https://git.sr.ht/~bwanab/AZ_Reversi.jl). The loss seems to have flatlined at around 10 iteration and very gradually slopes upward after that.

Are there any particular hyper-parameters that I should look at? One thing I tried that didn't seem to make much difference was making the net a little bigger by changing the number of blocks from 5 to 8.

Bill Allen · Answer 1 · Tue Aug 08 2023 01:46:33 GMT+0800 (China Standard Time)

Replying to myself, but I've found that by increasing the timeout when creating the AlphaZeroPlayer, the level of play gets much better. For example, in the case I gave above of playing 50/50 against the MinMax heuristic, using a 5 second timeout instead of the default 2 seconds, raises the level to more like 80/20. At 10 seconds, MinMax can't beat it.

If anybody has insight into this I'd love to hear it.

Jonathan Laurent · Answer 2 · Fri Aug 11 2023 02:29:33 GMT+0800 (China Standard Time)

Thanks for reporting on your experience! Tuning AlphaZero can be pretty hard indeed. Can I see some of the automatically generated metrics and graphs in your experiment?

Bill Allen · Answer 3 · Sat Aug 12 2023 21:33:40 GMT+0800 (China Standard Time)

These are the ones that seem to have the most information in them to me, but that might be my ignorance.