- You have to submit a report in
HTML
and your code to Canvas. - Put all your files (
HTML
report and code) into a single compressed folder namedLastname_Firstname_A1.zip
.
- If you are using Jupyter Notebook, you can export it in
HTML
by going through the top toolbar:If you are using Google Colab, you might need to do some extra steps to produce an"File -> Save and Export Notebook As... -> HTML"
HTML
report. Please Google for "how to convert ipynb notebook to HTML in Google Colab?". - This homework is self-containted in one Jupyter notebook. In your
zip
, we expect only yourHTML
report and one Jupyter notebook.
- If you wish to complete this assignment locally (not on Google Colab), you need to install Jupyter Notebook. You can do
pip install jupyter notebook
- In this assignment, you will play around with the famous
FrozenLake
environment. Please install Gymnasium (you can read more about Gymnasium here).pip install gymnasium
- It is strongly advised that you learn how to use virtual environment for Python. It creates an isolated environment from the system Python or other Python releases you have installed system-wide. It helps you manage Python packages in a clean fashion and allow you to only install necessary packages for particular projects. An exemplary, lightweight virtual environment module is
venv
(link). Your python distribution is likely to include it by default. If not, for example on Ubuntu, you can install it bysudo apt-get install python3-venv
In this assignment, you will implement planning (dynamic programming) algorithms on the FrozenLake
environment from Gymnasium (Link).
-
Q-Value Iteration (QVI): Implement Q-value iteration on the frozen lake environment.
(a). What is the optimal policy and value function?
(b). Plot$U_k = ||Q_k-Q_{k-1}||,$ where$Q_k$ is the Q-value during the$k^{\mathrm{th}}$ iteration.
(c). Use thefancy_visual
function to plot the heat maps of the optimal policy and value function. -
Policy Evaluation: Consider the following polices:
$(i)$ the optimal policy obtained from QVI, and$(ii)$ a uniformly random policy where each action is taken with equal probability. Compute the value of the these polices using:
(a). By solving a linear systems of equations.
(b). By the iterative approach.
(c). Which method is better and why? -
Policy Iteration (PI): Implement policy iteration on the frozen lake environment.
(a). What is the optimal policy and value function?
(b). Compare the convergence of QVI and PI.