-
- Based on V.Mnih et al. "Playing Atari with Deep Reinforcement Learning", 2013
- Deep Q-Learning implementation.
- Implementations of neural network action-value approximator in TensorFlow.
- Implemented experience replay memory and fixed Q targets.
- CartPole v0 OpenAI gym Q Rewards & Q value NN loss.
-
- Based on H.Hasselt et al. "Deep Reinforcement Learning with Double Q-learning", 2015
- Double Deep Q-Learning implementation.
- Implemented experience replay memory and fixed Q targets.
- Implemented two action-value neural network approximators, for action decision and fixed target.
- CartPole v0 OpenAI gym Q Rewards & Q value NN loss.
-
Deep Deterministic Policy Gradient:
- Based on T.Lillicrap et al. "Continuous control with deep reinforcement learning", 2016
- Deep Deterministic Policy Gradient implementation.
- Implemented action repeat, experience replay memory and fixed targets for Actor/Critic Networks with soft update.
- MountainCarContinuous-v0 solved after 70 episodes.
- MountainCarContinuous-v0 Critic Loss and Rewards.