This code base is an attempt to design an agent using Monte-Carlo Policy Gradient algorithm with baseline function.
git clone
the repository.- Enter the directory of the repository.
cd Lunar-Lander-RL
and open two terminal windows. - Training: Terminal window 1 command
python3 reinforce_baseline.py --task train
- To visualize live plots: Terminal window 2 command
python plotting.py
- You can change almost all variables by changing its value in reinforce_baseline.py.
- After completion, the directory will have two new .txt files, two image files and two directories with saved models. Do not change the location of those files before testing.
- Testing:
python reinforce_baseline.py --task train --exp_no 1
. You need to change the experiment number 1 to any integer k as it represents the folder inSaved_Exps/Test_k
where all the results will be saved.
Around 20 experiments were done with varyng hyperparameters and the best result in terms of solving the task - getting 200+ reward for 100 consequtive episodes in the least number of episodes is Test_11
. There were some tests which scored more rewards on 100 consequtive episodes than the model in Test_11. Each folder has a logfile.txt
representing the hyperparams value, test_log.txt
for test results, reward_log.txt
for episode logs during training.
- Tensorflow
- Gym
- Numpy
- MatPlotLib
Install them using pip
Please feel free to create a Pull Request for any suggested improvements or error in the code. If you are a beginner, you can refer to this for getting started.
If you found this useful, please consider starring(★) the repo so that it can reach a broader audience.
This project is licensed under the MIT License - see the LICENSE file for details.