yhyu13 / AlphaGOZero-python-tensorflow

Congratulation to DeepMind! This is a reengineering implementation (on behalf of many other git repo in /support/) of DeepMind's Oct19th publication: [Mastering the Game of Go without Human Knowledge]. The supervised learning approach is more practical for individuals. (This repository has single purpose of education only)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Go on!!

Zeta36 opened this issue · comments

What you have done until now looks pretty well. Please go on.

Hi, Zeta

Thanks! I'm doing my best to push commits. Hope this repo is helpful to you:)

Hello, again.
I'm following the process of the development of your code very interested.
I know you are still working in all this stuff but I've noted that in the evaluation step you are just saving when the current network is able to surpass a 40% accuracy. But I think that in the paper DeepMind says they compare the current network with the best store one. Don't you should in the test (evaluation step) load the last model saved and compare then its accuracy against the current model?

Hi, Zeta

Right now, I am implementing the supervised learning version (which DeepMind used as comparison to the RL approach). Therefore, a neural network doesn't have to compete with another one, as suggested in RL self-play with search.

The main reason for me to consider supervised learning is its way faster than RL. Though AWS P3 instance (NVIDIA Volta enabled) is powerful, I still want to take a conservative step and try out supervised learning first. Also, I am fixing a bunch of bugs, so keep up with commits as you will.
screen shot 2017-10-29 at 10 09 11 am

Your attention is appriciated.

Ok, I see :).

I'll be over here looking your progress.

Regards!!

Boys, this repo may help you in the future: https://github.com/mokemokechicken/reversi-alpha-zero

Hi, zeta

reversi-alpha-zero is a great project that I can learn from. Especially the UCT implementation. Thanks for letting me know that repo.

Regards

@yhyu13, I can see you've done a lot of progress. Is it already working the (self)training process? Did you get convergence?

My idea is once you have something functional to do a conversion of your code to play chess instead of Go (that's my goal).

Regards!!

@Zeta36

I am able to generate data from self-training done only one network. The second part of self-play pipeline is to evaluate one network against another. Once I figured out the correct way to implement AVP-MCTS, it should be easy finish the self-play pipeline. Since AWS release P3 instances, it should be easier to train a SL model first (Without accident, it should be done by the weekend of next week). The convergence of RL, according to DeepMind, is able to match trained SL model in 24 hours in their machine and implementation.

@Zeta36

Hi, I am interested in knowing how are you going to re-engineer this project into a Chess playing agent? Especially, the action-selection space of Chess is dynamic (meaning a player can't select died suit). Maybe it's not a big deal. Is there any good resource to learn how to label chess moves?

Well. I did a naive attempt some months ago using SL with human player games. I advice you that the source code is ugly, but I finally got more than 40% (test) accuracy in anticipating what a human would do in any board state. Here is my code: https://github.com/Zeta36/Policy-chess (I did it fast and I didn't have time to try to improve it. So the source code is really ugly but functional. You can even play against the NN model).

You will se in my code that I figure out all the labels (possible movements) by studying thousands of games played by humans. Finally, the full action space is just about 6400 possible movements (labels) or so.

Once you have a good result with your project I'll try to do something similar: using chess as the state environment (with the library python-chess) instead of Go, and using this labels I told you as the action space. I'm waiting for you ;).

@Zeta36
Wait, does it mean the input of NN is of size 8x8xN_feature, and the softmax output is 6000~? I can't believe it works this way, how do you manage to deal with NaN loss function induced by log(~0)?

@yhyu13 , 8x8 is for counting each square in the board. The other size is for N planes with other information of the board (pieces in each square, colors, etc.). The output of ~6000 accounts for the number of possible movements in chess (these are the labels for the softmax function).

"how do you manage to deal with NaN loss function induced by log(~0)?"

Tensorflow does it for me. The optimization function takes that into account. I promise you that my code is functional and get a good convergence. You can test it.

Regards!!

@Zeta36

Looks good! I didn't have any idea that you can possibly train successfully on 6,000 labels. That was six times more than the ImageNet. My next question is how long does it take? You mentioned for thousands of steps, do you mean mini-batch updates?

@yhyu13, If you run my code you will see the model converges really fast. I trained it in just 1 day...without any GPU and just with an Intel i5 microprocessor! Moreover, I had no time (nor a enough good machine) to train using a bigger dataset (there are lot of thousands of PGN files over there). Neither I had time to try with a better NN model or with others hyper-parameter configurations.

But, yes. I'm sure that as soon as you get a good result with your project I'll be able to replicate it into chess easily and successfully. You just tell me when your project is converging well and efficiently and I'll start adapting it ;).

@Zeta36

Your recommendation on reversi-zero does helps a lot. I've found one of my senior who can program asynchronous/multiprocessing very well. Hopefully, we are able to code a fast MCTS.

@Zeta36

What is your hardware? May be you can do some benchmark with me.

Try:

python main.py --mode=selfplay

@yhyu13, I would really like to help you with that but I've not GPU and my CPU is an Intel i5 :(

I'm sorry.

@Zeta36

You are fine! Since I am still improving my implementation, It's better to have a reference other than myself.

@Zeta36

I've a question: how did you train the chess playing model? Did you rent AWS?

I am almost done with the supervised learning approach (I can even train a 6 layer model ~about 7m weights on my computer). But instead of learning go from scratch, I'd like to give a try to chess.

But my concern is the expansion of the MCTS tree: in GO, it has <=362 legal moves (19^2+pass), but according to you, there are over 6,000 total legal moves, it would be hard to obtain a sensible search tree.

Well, in chess you have at any movement as a mean only about 20 legal moves to expand the search tree and not the whole ~6000 moves. Moreover, a chess game usually only last a mean of 30 moves and the last ones usually only have over 10 legal moves available or less.

By contrast, in Go each search expansion have the whole ~300 legal moves available for moving a long time, before the board start getting full.

@Zeta36

But if that's true, then the output of the neural network must be dynamic in this case?

No, why dynamic? It's really the same like in Go but instead of 362 labels you've got around 6000. But the rest is the same. If I have time this weekend I'll try to adapt the reversi-zero project to show you.

@Zeta36

I believe I am almost done (other than copy tensorflow model to another graph to compete against a best model). I will have a Thanksgiving break (7 days) in America, let me know if I can help out.

Yes, @yhyu13. I'm right now adapting the reversi-zero project into chess. I've done already the chess environment and the self-play worker. I'll make this weekend the optimization and the evaluation part and I will upload all to git. Once I create the git repo you can help me with your GPU to train the model and to fix any bug you can see.

If we succeeded we can do the same process later on in your project. You will see it's very easy to convert this kind of projects into chess.

I'll tell you soon.

Regards!!

@Zeta36

Nice! I will try it out on my computer first once you are ready. Is there anything special I need to care about? E.g. profiling, logging?

And would you like to train on an AWS server or on my machine? If you need AWS, it's better to start today because GPU instances need approval (it takes at most three business days).

Regards