Tencent Kaiwu Arena

This is the final project of CS3316(Reinforcement Learning) in Shanghai Jiao Tong University.

Team Members

Yongshan Chen: Guided the research process, proposed the improvemnets on PPO algorithm and implemented the PSRO algorithm.
Lai Jiang: Write the Abstract, Introduction and Conclusion part of the paper. Propose and polish the structure of the paper. Propose the possible experiments.
Yuhao Wang: Checked feasibility of truly PPO mechanism and implemented the PPO-RB part in codes. Finished the truly PPO part in introduction, related work, methods and conclusion section.
Linhao Zhong: Run the experiment, refine the parameter and evaluate the model. Write the Section 4.2, 4.3 and part of introduction.
Binglin Zhou: Run the experiment, refine the parameter and analysis the evaluation result. Write the Section 4.1, 4.4.

Setup

First, you should upload the code to the Kaiwu Arena platform. Then, you can run the experiment by just running the following command:

python3 train_test.py

Acknowledgement

We would like to thank the course instructor, Prof. Weinan Zhang, TA Xialin He and Kaiwu Arena for providing the platform for this project.

About

Languages

Language:Python 47.8%Language:C++ 26.4%Language:HTML 15.4%Language:Perl 3.9%Language:Shell 3.9%Language:Makefile 1.8%Language:Awk 0.5%Language:PHP 0.2%Language:DTrace 0.0%Language:C 0.0%