dqn dynamic-programming policy-iteration q-learning reinforcement-learning sarsa value-iteration

Reinforcement Learning Algorithms Implementations

KTH Reinforcement Learning (EL2805) 2019 coding assignments. As all my other repos, this is more an exercice for me to understand the algorithms than useful code. Hope it also helps you!

LAB 1

Dynamic Programming in finite fully-observable stochastic MDP

Agent (green) escaping (blue) a maze with walls (black) with a monster (red) following a uniform random walk capable of crossing walls: code

Value Iteration in infinite fully-observable stochastic MDP

Agent (green) robbing banks (blue) while escaping a police (red) which follows a random walk, never going away from him: code

SARSA (following epsilon-greedy policy) in infinite non-observable stochastic MDP

Policy learned by the agent for every Police (red) position: code

Q-Learning (from uniform policy) in infinite non-observable stochastic MDP

Agent (green) robbing again banks (blue) while escaping a police (red) who follows a random walk: code

About

Numpy & Keras based re-implementation of basic RL-algorithms: DP, VI, PI, SARSA, Q-Learning, DQN

dqn dynamic-programming policy-iteration q-learning reinforcement-learning sarsa value-iteration

Languages

Language:Python 78.2%Language:Jupyter Notebook 21.5%Language:Shell 0.3%