mcts

There are 13 repositories under mcts topic.

hijkzzz / Awesome-LLM-Strawberry
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
chain-of-thought coding llm mathematics mcts openai-o1 reinforcement-learning strawberry
6838
suragnair / alpha-zero-general
A clean implementation based on AlphaZero for any game in any framework + tutorial + Othello/Gobang/TicTacToe/Connect4 and more
tensorflow pytorch keras gobang gomoku alpha-zero alphago-zero alphago reinforcement-learning self-play mcts monte-carlo-tree-search othello tf deep-learning alphazero neural-network
Language:Jupyter Notebook 4297
junxiaosong / AlphaZero_Gomoku
An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)
alphago alphago-zero alphazero board-game gobang gomoku mcts monte-carlo-tree-search pytorch reinforcement-learning rl self-learning tensorflow
Language:Python 3543
werner-duvaud / muzero-general
MuZero
alphago alphazero deep-learning deep-reinforcement-learning gym machine-learning mcts model-based-rl monte-carlo-tree-search muzero muzero-general neural-network python3 pytorch reinforcement-learning residual-network rl self-learning tensorboard
Language:Python 2715
opendilab / LightZero
[NeurIPS 2023 Spotlight] LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios (awesome MCTS)
alpha-beta-pruning alphazero atari board-game board-games continuous-control efficientzero gomoku gumbel-muzero gym mcts mcts-algorithm monte-carlo-tree-search muzero pytorch reinforcement-learning sampled-muzero self-play stochastic-muzero tictactoe
Language:Python 1457
zzli2022 / Awesome-System2-Reasoning-LLM
Latest Advances on System-2 Reasoning
benchmark macro-action mcts o1 o3 prm r1 reasoning rl self-improve slow-fast system-2
Language:Python 1265
yaotingwangofficial / Awesome-MCoT
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
chain-of-thought cot deepseek-r1 instruction-tuning large-vision-language-model mllm-reasoning multimodal multimodal-chain-of-thought multimodal-large-language-models openai-o1 reasoning survey mcts slow-thinking system-2
877
chauvinSimon / My_Bibliography_for_Research_on_Autonomous_Driving
Personal notes about scientific and research works on "Decision-Making for Autonomous Driving"
pomdp mdp reinforcement-learning inverse-reinforcement-learning belief-planning planning model-based-reinforcement-learning decision-making decision-making-under-uncertainty game-theory mcts prediction bibliography behavioral-cloning carla imitation-learning end-to-end interaction intention risk-assessment
462
s-casci / tinyzero
Easily train AlphaZero-like agents on any environment you want!
alphazero mcts reinforcement-learning
Language:Python 431
hrpan / tetris_mcts
MCTS project for Tetris
reinforcement-learning mcts tetris deep-learning game tetris-bots
Language:Python 349
dylandjian / SuperGo
A student implementation of Alpha Go Zero
alphago alphago-zero machine-learning mcts python3 pytorch reinforcement-learning
Language:Python 281
CrazyAra
QueensGambit / CrazyAra
A Deep Learning UCI-Chess Variant Engine written in C++ & Python :parrot:
python crazyhouse chess-engine deep-learning artificial-intelligence convolutional-neural-network mcts alphazero mxnet gluon open-source machine-learning lichess python-chess alphago mcgs
Language:Jupyter Notebook 278
DataCanvasIO / Hypernets
A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.
autodl automl enas evolutionary-algorithms hyperparameter-optimization hyperparameter-tuning keras mcts monte-carlo-tree-search nas nasnet neural-architecture-search reinforcement-learning
Language:Python 265
vgarciasc / mcts-viz
Visualization of MCTS algorithm applied to Tic-tac-toe.
mcts tictactoe visualization p5js
Language:JavaScript 253
sungyubkim / Deep_RL_with_pytorch
A pytorch tutorial for DRL(Deep Reinforcement Learning)
deep-reinforcement-learning pytorch dqn a2c ppo soft-actor-critic self-imitation-learning random-network-distillation c51 qr-dqn iqn gail mcts uct counterfactual-regret-minimization hedge
Language:Jupyter Notebook 219
initial-h / AlphaZero_Gomoku_MPI
An asynchronous/parallel method of AlphaGo Zero algorithm with Gomoku
alphazero alphazero-gomoku parallel mpi4py tensorflow alphago mcts gomoku tensorlayer tree-search algorithm deep-reinforcement-learning dirichlet-distribution
Language:Python 212
JARVIS-Xs / SE-Agent
SE-Agent is a self-evolution framework for LLM Code agents. It enables trajectory-level evolution to exchange information across reasoning paths via Revision, Recombination, and Refinement, expanding the search space and escaping local optima. On SWE-bench Verified, it achieves SOTA performance
claude-code code-agent code-fix mcts self-evolve swe-agent swe-bench test-time-scaling
Language:Python 193
thuxugang / doudizhu
AI斗地主
ai card-game dqn reinforcement-learning doudizhu mcts
Language:Python 184
kaesve / muzero
A clean implementation of MuZero and AlphaZero following the AlphaZero General framework. Train and Pit both algorithms against each other, and investigate reliability of learned MuZero MDP models.
muzero alphazero reinforcement-learning tensorflow tensorflow2 mcts tf2 deep-learning deep-reinforcement-learning
Language:Jupyter Notebook 164
zjeffer / chess-deep-rl
Research project: create a chess engine using Deep Reinforcement Learning
ai alphazero artificial-intelligence chess chess-engine deep-learning deep-reinforcement-learning machine-learning mcts neural-network neural-networks reinforcement-learning
Language:Jupyter Notebook 158
PuYuuu / vehicle-interaction-decision-making
The decision-making of multiple vehicles at intersection bases on level-k game and MCTS
game-theory level-k mcts
Language:C++ 146
akolishchak / doom-net-pytorch
Reinforcement learning models in ViZDoom environment
pytorch vizdoom reinforcement-learning doom agent reinforcement learning ppo mcts doomnet-track1 behavior-tree
Language:Python 130
Sayuri
CGLemon / Sayuri
AlphaZero based engine for the game of Go (圍棋/围棋).
mcts weiqi baduk alphago deeplearning sayuri alphazero gumbel-alphazero
Language:C++ 114
rlglab / minizero
[IEEE ToG] MiniZero: An AlphaZero and MuZero Training Framework
alphazero deep-reinforcement-learning gumbel-alphazero gumbel-muzero mcts muzero monte-carlo-tree-search atari board-games go gomoku hex nogo othello outer-open-gomoku tictactoe killall-go reinforcement-learning
Language:C++ 109
YoujiaZhang / AlphaGo-Zero-Gobang
AlphaGo-Zero-Gobang 是一个基于强化学习的五子棋(Gobang)模型，主要用以了解AlphaGo Zero的运行原理的Demo，即神经网络是如何指导MCTS做出决策的，以及如何自我对弈学习。源码+教程
ai alphago alphazero deep-learning gobang gomuku gui mcts residual-networks tensorflow
Language:Python 109
manyoso / allie
Allie: A UCI compliant chess engine
alphabeta alphazero chess chess-engine deepmind mcts neural-network
Language:C++ 107
lowrollr / turbozero
fast + parallel AlphaZero in JAX
alphazero gpu-acceleration monte-carlo-tree-search reinforcement-learning vectorization mcts jax
Language:Python 102
Urinx / ReinforcementLearning
Reinforcing Your Learning of Reinforcement Learning
reinforcement-learning alphago-zero mcts q-learning policy-gradient gomoku frozenlake doom cartpole tic-tac-toe atari-2600 space-invaders ppo advantage-actor-critic dqn alphago ddpg
Language:Python 96
blanyal / alpha-zero
AlphaZero implementation for Othello, Connect-Four and Tic-Tac-Toe based on "Mastering the game of Go without human knowledge" and "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" by DeepMind.
alphazero alpha-zero alphago-zero tensorflow reinforcement-learning mcts tictactoe self-play game deep-learning machine-learning resnet connect-four connect4 othello reversi tic-tac-toe deepmind
Language:Python 91
Wangmerlyn / MCTS-GSM8k-Demo
This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems
llm-inference llms mcts
Language:Python 91
gorisanson / quoridor-ai
Quoridor AI based on Monte Carlo tree search
ai mcts monte-carlo-tree-search quoridor quoridor-game
Language:JavaScript 85
kobanium / Ray
Computer go engine using Monte-Carlo Tree Search (MCTS)
ray go weiqi baduk mcts monte-carlo-tree-search
Language:C++ 78
masouduut94 / MCTS-agent-python
Monte Carlo Tree Search (MCTS) is a method for finding optimal decisions in a given domain by taking random samples in the decision space and building a search tree accordingly. It has already had a profound impact on Artificial Intelligence (AI) approaches for domains that can be represented as trees of sequential decisions, particularly games and planning problems. In this project I used a board game called "HEX" as a platform to test different simulation strategies in MCTS field.
decision-space game-of-hex markov-decision-processes mcts monte-carlo-tree-search reinforcement-learning sequential-decisions
Language:Python 72
kobanium / TamaGo
Computer go engine using Monte-Carlo Tree Search written in Python3.
baduk go weiqi mcts monte-carlo-tree-search deep-learning go-text-protocol alphago alphago-zero alphagozero gumbel-alphazero reinforcement-learning
Language:Python 71
CGLemon / pyDLGO
基於深度學習的 GTP 圍棋（围棋）引擎，KGS 指引文件以及演算法教學。
alphago baduk deep-learning game-of-go goban mcts weiqi
Language:Python 68
coreylowman / synthesis
A rust implementation of AlphaZero algorithm
rust alphazero mcts connect4-game base65536 pytorch neural-network machine-learning deep-learning
Language:Rust 53

mcts

hijkzzz / Awesome-LLM-Strawberry

suragnair / alpha-zero-general

junxiaosong / AlphaZero_Gomoku

werner-duvaud / muzero-general

opendilab / LightZero

zzli2022 / Awesome-System2-Reasoning-LLM

yaotingwangofficial / Awesome-MCoT

chauvinSimon / My_Bibliography_for_Research_on_Autonomous_Driving

s-casci / tinyzero

hrpan / tetris_mcts

dylandjian / SuperGo

QueensGambit / CrazyAra

DataCanvasIO / Hypernets

vgarciasc / mcts-viz

sungyubkim / Deep_RL_with_pytorch

initial-h / AlphaZero_Gomoku_MPI

JARVIS-Xs / SE-Agent

thuxugang / doudizhu

kaesve / muzero

zjeffer / chess-deep-rl

PuYuuu / vehicle-interaction-decision-making

akolishchak / doom-net-pytorch

CGLemon / Sayuri

rlglab / minizero

YoujiaZhang / AlphaGo-Zero-Gobang

manyoso / allie

lowrollr / turbozero

Urinx / ReinforcementLearning

blanyal / alpha-zero

Wangmerlyn / MCTS-GSM8k-Demo

gorisanson / quoridor-ai

kobanium / Ray

masouduut94 / MCTS-agent-python

kobanium / TamaGo

CGLemon / pyDLGO

coreylowman / synthesis