takuseno / d3rlpy

An offline deep reinforcement learning library

Home Page:https://takuseno.github.io/d3rlpy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ValueError: too many values to unpack (expected 4) when using hopper-medium-v0 environment

sky-story opened this issue · comments

Hi,it is me again (hhh
When I run the example like this:

import d3rlpy

# prepare dataset
dataset, env = d3rlpy.datasets.get_d4rl('hopper-medium-v0')

# prepare algorithm
cql = d3rlpy.algos.CQLConfig().create(device='cuda:0')

# train
cql.fit(
    dataset,
    n_steps=100000,
    evaluators={"environment": d3rlpy.metrics.EnvironmentEvaluator(env)},
)

I find that the step function in the d4rl environment returns five values instead of the traditional four, and the detailed message is following:

Warning: Flow failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'flow'
Warning: CARLA failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this message.
No module named 'carla'
Warning: GymBullet failed to import. Set the environment variable D4RL_SUPPRESS_IMPORT_ERROR=1 to suppress this messag                                                              e.
No module named 'pybullet_envs'
/root/miniconda3/envs/drl_project/lib/python3.8/site-packages/gym/envs/registration.py:555: UserWarning: WARN: The env                                                              ironment hopper-medium-v0 is out of date. You should consider upgrading to version `v2`.
  logger.warn(
/root/miniconda3/envs/drl_project/lib/python3.8/site-packages/gym/envs/mujoco/mujoco_env.py:190: UserWarning: WARN: Th                                                              is version of the mujoco environments depends on the mujoco-py bindings, which are no longer maintained and may stop w                                                              orking. Please upgrade to the v4 versions of the environments (which depend on the mujoco python bindings instead), un                                                              less you are trying to precisely replicate previous works).
  logger.warn(
/root/miniconda3/envs/drl_project/lib/python3.8/site-packages/d4rl/gym_mujoco/gym_envs.py:13: UserWarning: This enviro                                                              nment is deprecated. Please use the most recent version of this environment.
  offline_env.OfflineEnv.__init__(self, **kwargs)
/root/miniconda3/envs/drl_project/lib/python3.8/site-packages/gym/spaces/box.py:127: UserWarning: WARN: Box bound prec                                                              ision lowered by casting to float32
  logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
load datafile: 100%|████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 12.77it/s]
2024-05-19 18:41.08 [info     ] Signatures have been automatically determined. action_signature=Signature(dtype=[dtype                                                              ('float32')], shape=[(3,)]) observation_signature=Signature(dtype=[dtype('float32')], shape=[(11,)]) reward_signature=                                                              Signature(dtype=[dtype('float32')], shape=[(1,)])
2024-05-19 18:41.08 [info     ] Action-space has been automatically determined. action_space=<ActionSpace.CONTINUOUS:                                                               1>
2024-05-19 18:41.08 [info     ] Action size has been automatically determined. action_size=3
2024-05-19 18:41.09 [info     ] dataset info                   dataset_info=DatasetInfo(observation_signature=Signature(dtype=[dtype('float32')], shape=[(11,)]), action_signature=Signature(dtype=[dtype('float32')], shape=[(3,)]), reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]), action_space=<ActionSpace.CONTINUOUS: 1>, action_size=3)
2024-05-19 18:41.09 [info     ] Directory is created at d3rlpy_logs/CQL_20240519184109
2024-05-19 18:41.09 [debug    ] Building models...
2024-05-19 18:41.10 [debug    ] Models have been built.
2024-05-19 18:41.10 [info     ] Parameters                     params={'observation_shape': [11], 'action_size': 3, 'config': {'type': 'cql', 'params': {'batch_size': 256, 'gamma': 0.99, 'observation_scaler': {'type': 'none', 'params': {}}, 'action_scaler': {'type': 'none', 'params': {}}, 'reward_scaler': {'type': 'none', 'params': {}}, 'actor_learning_rate': 0.0001, 'critic_learning_rate': 0.0003, 'temp_learning_rate': 0.0001, 'alpha_learning_rate': 0.0001, 'actor_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'critic_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'temp_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'alpha_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'actor_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'critic_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'tau': 0.005, 'n_critics': 2, 'initial_temperature': 1.0, 'initial_alpha': 1.0, 'alpha_threshold': 10.0, 'conservative_weight': 5.0, 'n_action_samples': 10, 'soft_q_backup': False, 'max_q_backup': False}}}
Epoch 1/10: 100%|██████████████████████| 10000/10000 [03:08<00:00, 53.09it/s, critic_loss=-39.5, conservative_loss=-42.8, alpha=0.638, actor_loss=-68.7, temp=0.673, temp_loss=2.04]
Traceback (most recent call last):
  File "my_cql.py", line 10, in <module>
    cql.fit(
  File "/root/d3rlpy/d3rlpy/algos/qlearning/base.py", line 422, in fit
    results = list(
  File "/root/d3rlpy/d3rlpy/algos/qlearning/base.py", line 588, in fitter
    test_score = evaluator(self, dataset)
  File "/root/d3rlpy/d3rlpy/metrics/evaluators.py", line 544, in __call__
    return evaluate_qlearning_with_environment(
  File "/root/d3rlpy/d3rlpy/metrics/utility.py", line 65, in evaluate_qlearning_with_environment
    observation, reward, done, truncated, _ = env.step(action)
  File "/root/miniconda3/envs/drl_project/lib/python3.8/site-packages/gym/wrappers/time_limit.py", line 50, in step
    observation, reward, terminated, truncated, info = self.env.step(action)
  File "/root/miniconda3/envs/drl_project/lib/python3.8/site-packages/d4rl/utils/wrappers.py", line 165, in step
    next_obs, reward, done, info = wrapped_step
ValueError: too many values to unpack (expected 4)

Can you tell me how to fix this? Thank you!

I think you're using Farama's D4RL package in your experiment. Please try this:

$ pip uninstall D4RL
$ d3rlpy install d4rl

In this way, d3rlpy will install my fork D4RL package from https://github.com/takuseno/D4RL , which fixes some of incompatibilities.

I think you're using Farama's D4RL package in your experiment. Please try this:

$ pip uninstall D4RL
$ d3rlpy install d4rl

In this way, d3rlpy will install my fork D4RL package from https://github.com/takuseno/D4RL , which fixes some of incompatibilities.

Thanks for the suggestion!I originally installed the D4RL package using the following commands:

pip install d3rlpy
pip install git+https://github.com/Farama-Foundation/D4RL
pip install -U gym
pip uninstall pybullet

However, it appears that this setup defaults to Farama's D4RL package.
I'll give your approach a try using d3rlpy install d4rl
By the way, it seems that the command d3rlpy install d4rl requires python 3.9?

By the way, it seems that the command d3rlpy install d4rl requires python 3.9?

I don't think so. Did you see any errors?

By the way, it seems that the command d3rlpy install d4rl requires python 3.9?

I don't think so. Did you see any errors?

@takuseno Sorry for the delayed response. Yes, I have some evidence that this issue exists. I am using Python 3.8, and when I run the command to install d4rl, I get the following error:

(drl_project) root@autodl-container-362e44a99f-d27d57c6:~# d3rlpy install d4rl
Traceback (most recent call last):
  File "/root/miniconda3/envs/drl_project/bin/d3rlpy", line 5, in <module>
    from d3rlpy.cli import cli
  File "/root/miniconda3/envs/drl_project/lib/python3.8/site-packages/d3rlpy/cli.py", line 352, in <module>
    name: list[str], upgrade: bool = False, check: bool = True
TypeError: 'type' object is not subscriptable

I suspect it is because type annotations like list[str] are only supported in Python 3.9 and above.

Thanks for following up on this! Yeah, you're right. In the latest commit, I've updated these lines:
18d710a

If you install d3rlpy from source, it should work with Python 3.8.

That's great!I'll close this issue.