Go on!!

Question

Go on!!

Zeta36 opened this issue 7 years ago · comments

Samuel commented 7 years ago

What you have done until now looks pretty well. Please go on.

俞航 · Answer 1 · Thu Oct 26 2017 03:26:03 GMT+0800 (China Standard Time)

Hi, Zeta

Thanks! I'm doing my best to push commits. Hope this repo is helpful to you:)

Samuel · Answer 2 · Tue Oct 31 2017 03:27:33 GMT+0800 (China Standard Time)

Hello, again.
I'm following the process of the development of your code very interested.
I know you are still working in all this stuff but I've noted that in the evaluation step you are just saving when the current network is able to surpass a 40% accuracy. But I think that in the paper DeepMind says they compare the current network with the best store one. Don't you should in the test (evaluation step) load the last model saved and compare then its accuracy against the current model?

俞航 · Answer 3 · Tue Oct 31 2017 03:51:59 GMT+0800 (China Standard Time)

Hi, Zeta

Right now, I am implementing the supervised learning version (which DeepMind used as comparison to the RL approach). Therefore, a neural network doesn't have to compete with another one, as suggested in RL self-play with search.

The main reason for me to consider supervised learning is its way faster than RL. Though AWS P3 instance (NVIDIA Volta enabled) is powerful, I still want to take a conservative step and try out supervised learning first. Also, I am fixing a bunch of bugs, so keep up with commits as you will.

Your attention is appriciated.

Samuel · Answer 4 · Tue Oct 31 2017 04:07:57 GMT+0800 (China Standard Time)

Ok, I see :).

I'll be over here looking your progress.

Regards!!

Samuel · Answer 5 · Tue Oct 31 2017 21:58:46 GMT+0800 (China Standard Time)

Boys, this repo may help you in the future: https://github.com/mokemokechicken/reversi-alpha-zero

俞航 · Answer 6 · Tue Oct 31 2017 23:15:19 GMT+0800 (China Standard Time)

Hi, zeta

reversi-alpha-zero is a great project that I can learn from. Especially the UCT implementation. Thanks for letting me know that repo.

Regards

Samuel · Answer 7 · Sat Nov 04 2017 15:07:00 GMT+0800 (China Standard Time)

@yhyu13, I can see you've done a lot of progress. Is it already working the (self)training process? Did you get convergence?

My idea is once you have something functional to do a conversion of your code to play chess instead of Go (that's my goal).

Regards!!

俞航 · Answer 8 · Sun Nov 05 2017 03:06:57 GMT+0800 (China Standard Time)

@Zeta36

I am able to generate data from self-training done only one network. The second part of self-play pipeline is to evaluate one network against another. Once I figured out the correct way to implement AVP-MCTS, it should be easy finish the self-play pipeline. Since AWS release P3 instances, it should be easier to train a SL model first (Without accident, it should be done by the weekend of next week). The convergence of RL, according to DeepMind, is able to match trained SL model in 24 hours in their machine and implementation.

俞航 · Answer 9 · Tue Nov 07 2017 15:10:50 GMT+0800 (China Standard Time)

@Zeta36

Hi, I am interested in knowing how are you going to re-engineer this project into a Chess playing agent? Especially, the action-selection space of Chess is dynamic (meaning a player can't select died suit). Maybe it's not a big deal. Is there any good resource to learn how to label chess moves?

Samuel · Answer 10 · Tue Nov 07 2017 18:05:33 GMT+0800 (China Standard Time)

Well. I did a naive attempt some months ago using SL with human player games. I advice you that the source code is ugly, but I finally got more than 40% (test) accuracy in anticipating what a human would do in any board state. Here is my code: https://github.com/Zeta36/Policy-chess (I did it fast and I didn't have time to try to improve it. So the source code is really ugly but functional. You can even play against the NN model).

You will se in my code that I figure out all the labels (possible movements) by studying thousands of games played by humans. Finally, the full action space is just about 6400 possible movements (labels) or so.

Once you have a good result with your project I'll try to do something similar: using chess as the state environment (with the library python-chess) instead of Go, and using this labels I told you as the action space. I'm waiting for you ;).

俞航 · Answer 11 · Wed Nov 08 2017 01:01:50 GMT+0800 (China Standard Time)

@Zeta36
Wait, does it mean the input of NN is of size 8x8xN_feature, and the softmax output is 6000~? I can't believe it works this way, how do you manage to deal with NaN loss function induced by log(~0)?

Samuel · Answer 12 · Wed Nov 08 2017 03:16:00 GMT+0800 (China Standard Time)

@yhyu13 , 8x8 is for counting each square in the board. The other size is for N planes with other information of the board (pieces in each square, colors, etc.). The output of ~6000 accounts for the number of possible movements in chess (these are the labels for the softmax function).

"how do you manage to deal with NaN loss function induced by log(~0)?"

Tensorflow does it for me. The optimization function takes that into account. I promise you that my code is functional and get a good convergence. You can test it.

Regards!!

俞航 · Answer 13 · Wed Nov 08 2017 13:50:40 GMT+0800 (China Standard Time)

@Zeta36

Looks good! I didn't have any idea that you can possibly train successfully on 6,000 labels. That was six times more than the ImageNet. My next question is how long does it take? You mentioned for thousands of steps, do you mean mini-batch updates?

Samuel · Answer 14 · Wed Nov 08 2017 14:28:47 GMT+0800 (China Standard Time)

@yhyu13, If you run my code you will see the model converges really fast. I trained it in just 1 day...without any GPU and just with an Intel i5 microprocessor! Moreover, I had no time (nor a enough good machine) to train using a bigger dataset (there are lot of thousands of PGN files over there). Neither I had time to try with a better NN model or with others hyper-parameter configurations.

But, yes. I'm sure that as soon as you get a good result with your project I'll be able to replicate it into chess easily and successfully. You just tell me when your project is converging well and efficiently and I'll start adapting it ;).

俞航 · Answer 15 · Thu Nov 09 2017 10:54:35 GMT+0800 (China Standard Time)

@Zeta36

Your recommendation on reversi-zero does helps a lot. I've found one of my senior who can program asynchronous/multiprocessing very well. Hopefully, we are able to code a fast MCTS.

俞航 · Answer 16 · Mon Nov 13 2017 15:21:52 GMT+0800 (China Standard Time)

@Zeta36

What is your hardware? May be you can do some benchmark with me.

Try:

python main.py --mode=selfplay

Samuel · Answer 17 · Mon Nov 13 2017 16:39:47 GMT+0800 (China Standard Time)

@yhyu13, I would really like to help you with that but I've not GPU and my CPU is an Intel i5 :(

I'm sorry.

俞航 · Answer 18 · Tue Nov 14 2017 00:54:32 GMT+0800 (China Standard Time)

@Zeta36

You are fine! Since I am still improving my implementation, It's better to have a reference other than myself.

俞航 · Answer 19 · Tue Nov 14 2017 09:32:20 GMT+0800 (China Standard Time)

@Zeta36

I've a question: how did you train the chess playing model? Did you rent AWS?

I am almost done with the supervised learning approach (I can even train a 6 layer model ~about 7m weights on my computer). But instead of learning go from scratch, I'd like to give a try to chess.

But my concern is the expansion of the MCTS tree: in GO, it has <=362 legal moves (19^2+pass), but according to you, there are over 6,000 total legal moves, it would be hard to obtain a sensible search tree.

Samuel · Answer 20 · Tue Nov 14 2017 14:20:18 GMT+0800 (China Standard Time)

Well, in chess you have at any movement as a mean only about 20 legal moves to expand the search tree and not the whole ~6000 moves. Moreover, a chess game usually only last a mean of 30 moves and the last ones usually only have over 10 legal moves available or less.

By contrast, in Go each search expansion have the whole ~300 legal moves available for moving a long time, before the board start getting full.

俞航 · Answer 21 · Wed Nov 15 2017 09:39:45 GMT+0800 (China Standard Time)

@Zeta36

But if that's true, then the output of the neural network must be dynamic in this case?

Samuel · Answer 22 · Wed Nov 15 2017 14:08:21 GMT+0800 (China Standard Time)

No, why dynamic? It's really the same like in Go but instead of 362 labels you've got around 6000. But the rest is the same. If I have time this weekend I'll try to adapt the reversi-zero project to show you.

俞航 · Answer 23 · Sat Nov 18 2017 05:46:01 GMT+0800 (China Standard Time)

@Zeta36

I believe I am almost done (other than copy tensorflow model to another graph to compete against a best model). I will have a Thanksgiving break (7 days) in America, let me know if I can help out.

Samuel · Answer 24 · Sat Nov 18 2017 13:38:27 GMT+0800 (China Standard Time)

Yes, @yhyu13. I'm right now adapting the reversi-zero project into chess. I've done already the chess environment and the self-play worker. I'll make this weekend the optimization and the evaluation part and I will upload all to git. Once I create the git repo you can help me with your GPU to train the model and to fix any bug you can see.

If we succeeded we can do the same process later on in your project. You will see it's very easy to convert this kind of projects into chess.

I'll tell you soon.

Regards!!

俞航 · Answer 25 · Sun Nov 19 2017 03:52:28 GMT+0800 (China Standard Time)

@Zeta36

Nice! I will try it out on my computer first once you are ready. Is there anything special I need to care about? E.g. profiling, logging?

And would you like to train on an AWS server or on my machine? If you need AWS, it's better to start today because GPU instances need approval (it takes at most three business days).

Regards