A collection of multi-quality policies for continuous control tasks.
It requires:
- Python 3.6+
- mujoco-py, mujoco 200 and mujoco license. Please, follow
mujoco-py
installation instructions from here. - Pytorch >= 1.8.0
Python package and dependencies could be installed using:
pip install git+https://github.com/koulanurag/policybazaar@main#egg=policybazaar
Or
git clone https://github.com/koulanurag/policybazaar.git
cd policybazar
pip install -e .
>>> import policybazaar, gym, torch
>>> model, model_info = policybazaar.get_policy('d4rl:maze2d-open-v0',pre_trained=1)
>>> model_info
{'score_mean': 122.2, 'score_std': 10.61}
>>> episode_reward = 0
>>> done = False
>>> env = gym.make('d4rl:maze2d-open-v0')
>>> obs = env.reset()
>>> while not done:
... action = model.actor(torch.tensor(obs).unsqueeze(0).float())
... obs, reward, done, step_info = env.step(action.data.cpu().numpy()[0])
... episode_reward += reward
>>> episode_reward
120
>>> # Let's get dataset corresponding to a policy
>>> dataset = policybazaar.get_dataset('d4rl:maze2d-open-v0',pre_trained=1)
- Install:
pip install -e ".[test]"
- Run:
pytest
- 11th May, 2021:
- release(alpha3) includes cassie policies
- 29th Mar, 2021:
- Initial release(alpha2) with pre-trained policies for maze2d and some environments in mujoco(gym) which have been also used in d4rl.
- policies were hand-picked
- 23rd Mar, 2021:
- Initial release(alpha1) with pre-trained policies for maze2d in d4rl.
In the following, we report performance of various pre-trained models. These scores are reported over 20
episode runs.
Environment Name | pre_trained=1 (best) |
pre_trained=2 |
pre_trained=3 |
pre_trained=4 (worst) |
---|---|---|---|---|
d4rl:maze2d-open-v0 |
122.2±10.61 | 104.9±22.19 | 18.05±14.85 | 4.85±8.62 |
d4rl:maze2d-medium-v1 |
245.55±272.75 | 203.75±252.61 | 256.65±260.16 | 258.55±262.81 |
d4rl:maze2d-umaze-v1 |
235.5±35.45 | 197.75±58.21 | 23.4±73.24 | 3.2±9.65 |
d4rl:maze2d-large-v1 |
231.35±268.37 | 160.8±201.97 | 50.65±76.94 | 9.95±9.95 |
d4rl:maze2d-open-dense-v0 |
127.18±9.17 | 117.53±10.21 | 63.96±16.03 | 26.82±9.19 |
d4rl:maze2d-medium-dense-v1 |
209.25±190.66 | 192.36±193.29 | 225.54±183.33 | 232.94±184.62 |
d4rl:maze2d-umaze-dense-v1 |
240.22±25.1 | 201.12±21.35 | 121.94±10.71 | 45.5±44.53 |
d4rl:maze2d-large-dense-v1 |
168.83±225.78 | 239.1±208.43 | 204.39±197.96 | 90.89±70.61 |
Environment Name | pre_trained=1 (best) |
pre_trained=2 |
pre_trained=3 |
pre_trained=4 (worst) |
---|---|---|---|---|
d4rl:antmaze-umaze-v0 |
0.0±0.0 | 0.0±0.0 | 0.0±0.0 | 0.0±0.0 |
d4rl:antmaze-medium-diverse-v0 |
0.0±0.0 | 0.0±0.0 | 0.0±0.0 | 0.0±0.0 |
d4rl:antmaze-large-diverse-v0 |
0.0±0.0 | 0.0±0.0 | 0.0±0.0 | 0.0±0.0 |
Environment Name | pre_trained=1 (best) |
pre_trained=2 |
pre_trained=3 |
pre_trained=4 (worst) |
---|---|---|---|---|
HalfCheetah-v2 |
1169.13±80.45 | 1044.39±112.61 | 785.88±303.59 | 94.79±40.88 |
Hopper-v2 |
1995.84±794.71 | 1466.71±497.1 | 1832.43±560.86 | 236.51±1.09 |
Walker2d-v2 |
2506.9±689.45 | 811.28±321.66 | 387.01±42.82 | 162.7±102.14 |
Dependency: pip install -e ".[cassie]"
Environment Name | pre_trained=1 (best) |
pre_trained=2 |
pre_trained=3 |
pre_trained=4 (worst) |
---|---|---|---|---|
cassie:CassieWalkSlow-v0 |
267.36±2.84 | - | - | - |
cassie:CassieWalkFast-v0 |
208.78±68.99 | - | - | - |
cassie:CassieStand-v0 |
289.93±2.03 | - | - | - |
cassie:CassieHop-v0 |
184.27±71.62 | - | - | - |