minnsou/suzume-jong

Description

Suzume-Jong with reinforcement learning

folder description

baseline_results/ save 1player optimal(greedy) play (or random play) results

baseline_run.py run 1player optimal(greedy) play in Suzume-jan, and save log

first_suzume.ipynb this file made hand2optimal, hand_idx_prob and value_iteration

hand2optimal/ npy or pickle files to get optimal discard tiles

mahjong_networks.py define custom network(mlp, conv1d, conv2d, resnet)

mahjong_utils.py suzume-jan utility function such as is_win, calc_point and so on

reward_predict.py make global reward predictor

suzume_env/ Suzume-jan envirionment

train_ppo.py train and test ppo

value_iteration/ save Q-values by using value iteration

OpenAI baselines, tensorflow==1.14, keras

python baseline_run.py [-p NUM_PLAYER] [-t TOTAL_TIMESTEPS] [-og OPPONENT_PLAYER_GAMMA] [-pg PLAYER_GAMMA] [-rs REWARD_SCALE] [-pr]

python baseline_run.py -p 2 -t 300000 -og 09 -pg random -rs 20 -ps

python baseline_run.py [-l PICKLE_NAME]

python baseline_run.py -l baseline_results/2_09_09_3000000.pickle

python train_ppo.py [--env ENV_NAME] [--save_path PICKLE_PATH] [-t NUM_TIMESTEPS]

python train_ppo.py --env Suzume2-v0 --save_path save_models/default.pkl -t 30000

python train_ppo.py --play [--load_path MODEL_PATH]

python train_ppo.py --play --load_path save_models/ppo_2_5_1_99.pkl --pickle_name 09

for more details, run python train_ppo.py -h

python reward_predict.py [-p NUM_PLAYERS] [-pg PLAYER_GAMMA] [-og OPPOPNENT_GAMMA] [-m MODEL_NAME] [--play]

python reward_predict.py -m dense

This project is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.