tinkoff-ai / CORL

High-quality single-file implementations of SOTA Offline and Offline-to-Online RL algorithms: AWAC, BC, CQL, DT, EDAC, IQL, SAC-N, TD3+BC, LB-SAC, SPOT, Cal-QL, ReBRAC

Home Page:https://arxiv.org/abs/2210.07105

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issues on getting antmaze-medium-play-v0 results with iql

egg-west opened this issue · comments

Hi there,
Thank you for releasing the CORL benchmark. I cloned the latest repo and using parameters as below to run antmaze-medium-play-v0 experiment. However, I got all near 0 normalized reward from the first 430,000 gradient step.

I did not change the code except using these parameters:

class TrainConfig:
    # Experiment
    device: str = "cpu"
    env: str = "antmaze-medium-play-v0"  # OpenAI gym environment name
    seed: int = 0  # Sets Gym, PyTorch and Numpy seeds
    eval_freq: int = int(1e4)  # How often (time steps) we evaluate
    n_episodes: int = 100  # How many episodes run during evaluation
    max_timesteps: int = int(1e6)  # Max time steps to run environment
    checkpoints_path: str = "./models/iql"  # Save path
    load_model: str = ""  # Model load file name, "" doesn't load
    # IQL
    buffer_size: int = 10_000_000  # Replay buffer size
    batch_size: int = 256  # Batch size for all networks
    discount: float = 0.99  # Discount factor
    tau: float = 0.005  # Target network update rate
    beta: float = 10.0  # Inverse temperature. Small beta -> BC, big beta -> maximizing Q
    iql_tau: float = 0.9  # Coefficient for asymmetric loss
    iql_deterministic: bool = False  # Use deterministic actor
    normalize: bool = True  # Normalize states
    normalize_reward: bool = False  # Normalize reward
    # Wandb logging
    project: str = "CORL-default"
    group: str = "IQL-D4RL"
    name: str = "IQL"

And the results are as below:

 % python iql.py
objc[33597]: Class GLFWApplicationDelegate is implemented in both /Users/xxx/.mujoco/mujoco210/bin/libglfw.3.dylib (0x11aa13778) and /opt/anaconda3/envs/iql2/lib/python3.10/site-packages/glfw/libglfw.3.dylib (0x11aabc7e8). One of the two will be used. Which one is undefined.
objc[33597]: Class GLFWWindowDelegate is implemented in both /Users/xxx/.mujoco/mujoco210/bin/libglfw.3.dylib (0x11aa13700) and /opt/anaconda3/envs/iql2/lib/python3.10/site-packages/glfw/libglfw.3.dylib (0x11aabc810). One of the two will be used. Which one is undefined.
objc[33597]: Class GLFWContentView is implemented in both /Users/xxx/.mujoco/mujoco210/bin/libglfw.3.dylib (0x11aa137a0) and /opt/anaconda3/envs/iql2/lib/python3.10/site-packages/glfw/libglfw.3.dylib (0x11aabc860). One of the two will be used. Which one is undefined.
objc[33597]: Class GLFWWindow is implemented in both /Users/xxx/.mujoco/mujoco210/bin/libglfw.3.dylib (0x11aa13818) and /opt/anaconda3/envs/iql2/lib/python3.10/site-packages/glfw/libglfw.3.dylib (0x11aabc8d8). One of the two will be used. Which one is undefined.
Warning: Flow failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'flow'
Warning: CARLA failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'carla'
pybullet build time: Oct 16 2022 01:59:14
/opt/anaconda3/envs/iql2/lib/python3.10/site-packages/gym/envs/registration.py:505: UserWarning: WARN: The environment antmaze-medium-play-v0 is out of date. You should consider upgrading to version `v2` with the environment ID `antmaze-medium-play-v2`.
  logger.warn(
/Users/xxx/Documents/project_offlineexploration/D4RL_6330b4e09e36a80f4b706a3885d59d97745c05a9/d4rl/locomotion/ant.py:180: UserWarning: This environment is deprecated. Please use the most recent version of this environment.
  offline_env.OfflineEnv.__init__(self, **kwargs)
Target Goal:  (20.64647417679362, 21.089515421327548)
/opt/anaconda3/envs/iql2/lib/python3.10/site-packages/gym/spaces/box.py:84: UserWarning: WARN: Box bound precision lowered by casting to float32
  logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
load datafile: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:03<00:00,  2.14it/s]
Dataset size: 999092
Checkpoints path: ./models/iql
---------------------------------------
Training IQL, Env: antmaze-medium-play-v0, Seed: 0
---------------------------------------
wandb: Currently logged in as: lxu. Use `wandb login --relogin` to force relogin
wandb: wandb version 0.13.4 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.12.21
wandb: Run data is saved locally in /Users/xxx/Documents/default_repo/CORL/algorithms/wandb/run-20221019_133015-2d1a2d9d-8f35-4295-bac7-e39fa293699c
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run IQL
wandb: ⭐️ View project at https://wandb.ai/xxx/CORL-default
wandb: 🚀 View run at https://wandb.ai/xxx/CORL-default/runs/2d1a2d9d-8f35-4295-bac7-e39fa293699c
wandb: WARNING Calling wandb.run.save without any arguments is deprecated.Changes to attributes are automatically persisted.

iql_results

Hi,

Thanks for reporting this. I've just carefully checked the configuration utilized in the benchmarks, and it seems that the normalize_reward should also be set to True. Can you try with this flag set on and let us know if it helped?

@DT6A FYI, make sure that the config files completely match the ones used in wandb reports

Thanks for your quick response. Let me try this.

My problem is solved by setting normalize_reward to True.

@DT6A waiting for the update configs, and then we can close the issue

@egg-west thank you

Thanks for your report. Antmaze configs are fixed now #8