Can you try to incorporate self-play technology similar to AlphaGo Zero?

Question

Can you try to incorporate self-play technology similar to AlphaGo Zero?

win10ogod opened this issue 8 months ago · comments

Adam Karvonen · Answer 1 · Fri Jan 12 2024 23:25:19 GMT+0800 (China Standard Time)

For now, my goal is more focused on the interpretability of an LLM trained to play chess, rather than making a better chess LLM. When making a better chess LLM I would first explore using a larger model trained on more data with more compute, and identifying which one of those factors is the current Elo bottleneck. This would probably be more compute efficient than a self-play approach. We have an effectively unlimited amount of human chess games data, and an existence proof in GPT-3.5-turbo-instruct that an LLM can play at 1800 Elo off of human chess games.

However, self play with LLMs could also be interesting as well. I currently have no plans of exploring this, but anyone is welcome to try.