cs394-rl_project

Final project for CS394R: Implementation for Distributed DDPG and D4PG Algorithm.

Code Explanation

depricated includes our implementation of Distributed DDPG and D4PG using tf_agent.
- depricated/train_eval.py is for training and evaluating Distributed DDPG and D4PG based on depricated/prams.py.
- depricated/train_ddpg.py is for training and evaluating DDPG.
- depricated/networks/distributional_critic_network.py is an implementation of critic network returning the probability over the distribution, and depricated/d4pg/d4pg_agent.py is an implementation of training.
- depricated/utils/misc_utils.py is an implementation of running an agent for a replay buffer.
- Our Distributed DDPG implementation works well on several domains but the D4PG implementation does not show the convergence for some reason. Therefore the experiment results below is not from this D4PG implementation.
In the root, there is another implementation of D4PG starting from existing repository.
- We have directly copied some usefule utility functions, such as utils/prioritised_experience_replay.py, and utils/env_wrapper.py.
- However, we have mostly reimplemented actor network and critic network, as well as training part.
- train_eval.py runs training and evaluation of D4PG algorithm based on prams.py.
- play.py reads trained networks and records a video.