Parallel Q Learning (PQL)

This repository provides a PyTorch implementation of the paper Parallel Q-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation.

Zechu Li*, Tao Chen*, Zhang-Wei Hong, Anurag Ajay, Pulkit Agrawal

📚 Citation
⚙️ Installation
📜 Usage
👏 Acknowledgement

📚 Citation

@inproceedings{li2023parallel,
  title={Parallel $Q$-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation},
  author={Li, Zechu and Chen, Tao and Hong, Zhang-Wei and Ajay, Anurag and Agrawal, Pulkit},
  booktitle={International Conference on Machine Learning},
  year={2023},
  organization={PMLR}
}

⚙️ Installation

Install ⚡ PQL

Clone the package:

git clone git@github.com:Improbable-AI/pql.git
cd pql

Create Conda environment and install dependencies:
```
./create_conda_env_pql.sh
pip install -e .
```

Install Isaac Gym

Note In original paper, we use Isaac Gym Preview 3 and task configs in commit ca7a4fb762f9581e39cc2aab644f18a83d6ab0ba in IsaacGymEnvs.

Download and install Isaac Gym Preview 4 from https://developer.nvidia.com/isaac-gym

Unzip the file:

tar -xf IsaacGym_Preview_4_Package.tar.gz

Install IsaacGym

cd isaacgym/python
pip install -e . --no-deps

Install IsaacGymEnvs

git clone https://github.com/NVIDIA-Omniverse/IsaacGymEnvs.git
cd IsaacGymEnvs
pip install -e . --no-deps

Export LIBRARY_PATH

export LD_LIBRARY_PATH=$(conda info --base)/envs/pql/lib/:$LD_LIBRARY_PATH

System Requirements

Warning Note that wall-clock efficiency highly depends on the GPU type and will decrease with smaller/fewer GPUs (check Section 4.4 in the paper).

Isaac Gym requires an NVIDIA GPU. To train in the default configuration, we recommend a GPU with at least 10GB of VRAM. For smaller GPUs, you can decrease the number of parallel environments (cfg.num_envs), batch_size (cfg.algo.batch_size), replay buffer capacity (cfg.algo.memory_size), etc. ⚡ PQL can run on 1/2/3 GPUs (set GPU ID cfg.p_learner_gpu and cfg.v_learner_gpu; default GPU ID for Isaac Gym env is GPU:0).

📜 Usage

✏️ Logging

We use Weights & Biases (W&B) for logging.

Get a W&B account from https://wandb.ai/site
Get your API key from https://wandb.ai/authorize
set up your account in terminal
```
export WANDB_API_KEY=$API Key$
```

💡 Train with ⚡ PQL

Run ⚡ PQL on Allegro Hand task. A full list of tasks in Isaac Gym is available here.

python scripts/train_pql.py task=AllegroHand

Run ⚡ PQL-D (with distributional RL)

python scripts/train_pql.py task=AllegroHand algo.distl=True algo.cri_class=DistributionalDoubleQ

Run ⚡ PQL on a single GPU. The default is on 2 GPUs. Please specify the GPU id.

python scripts/train_pql.py task=AllegroHand algo.num_gpus=1 algo.p_learner_gpu=0 algo.v_learner_gpu=0

Run ⚡ PQL on 3 GPUs.

python scripts/train_pql.py task=AllegroHand algo.p_learner_gpu=1 algo.v_learner_gpu=2

🔖 Baselines

Run DDPG baseline

python scripts/train_baselines.py algo=ddpg_algo task=AllegroHand

Run SAC baseline

python scripts/train_baselines.py algo=sac_algo task=AllegroHand

Run PPO baseline

python scripts/train_baselines.py algo=ppo_algo task=AllegroHand isaac_param=True

💾 Saving and Loading

Checkpoints are automatically saved as W&B Artifacts.

To load and visualize the policy, run

python scripts/visualize.py task=AllegroHand headless=False num_envs=10 artifact=$team-name$/$project-name$/$run-id$/$version$

👏 Acknowledgement

We thank the members of the Improbable AI lab for the helpful discussions and feedback on the paper. We are grateful to MIT Supercloud and the Lincoln Laboratory Supercomputing Center for providing HPC resources.

About

Parallel Q-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation

MIT License

Languages

Language:Python 99.1%Language:Shell 0.9%