google-deepmind / mctx

Monte Carlo tree search in JAX

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possibility of multi-action expansion for leaf nodes

NullBy1e opened this issue · comments

I've started working with your library I'm and wondering about the feasibility of implementing multi-action expansion for leaf nodes, particularly for scenarios with small search spaces where this could provide significant improvements of quality to the search tree.

Is multi-action expansion for leaf nodes currently possible with the library, through the use of RecurrentFn or otherwise?

If not directly supported, are there any existing functions or methods that could be easily modified to achieve this ( I'm currently looking at expand and simulate functions but that would require a modification of backprop function from search.py )?

If possible could you point me in the right direction for implementing this?

I'm still trying to understand the full library so any guidance or suggestions would be greatly appreciated. Thank you!

What is multi-action expansion?

Multi-action expansion as in creation of more then one action from a leaf node, currently in search.py at line 100, it only expands a single action picked from the simulate function, I'd like for expand to be able to take multiple-actions at once.

The current version of expand only takes action in Batch dimension, and in my scenario it would be required to accept [B, num_actions] to account for more actions, if I'm not missing some easier method?

A more useful explanation of it, would be that in a single-action expansion step in chess we add only one new position that the policy network + PUCT picks as most probable/likely/more advantageous, knight to e4 perhaps.

A multi-node expansion would create multiple new positions, let's say top 2 moves from that policy network + PUCT, so knight to e3 and queen to d5 for example.

It's still from the same parent node as single-action but we just add more children nodes per MCTS 'iteration'.

I hope this makes a bit more sense now and clears up my phrasing a little bit!

Thanks for the explanation. You can fork mctx and experiment with the different search variants.
The obtained speed will depend on the used batch size and the available hardware.

Naively, the multi-action is similar to visiting the same node again and selecting a different action there.
The tree search can visit the same node and select an action there, if the tree search considers such path to be promising.

Hmm, that's a good point, I'll experiment with more simulations to check if that behaviour will emerge by itself, it certainly should given confidence bounds.

Thanks a lot for the feedback, and thanks for the time taken !