takuseno / d3rlpy

An offline deep reinforcement learning library

Home Page:https://takuseno.github.io/d3rlpy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[QUESTION] multidimensional states and actions

bzeni1 opened this issue · comments

When attempting to create an MDPDataset in d3rlpy with data shaped as for example (100, 5) for observations, (100, 5) for actions, (100,) for rewards, (100, 5) for next observations, and (100,) for terminals, all of which are valid and consistent, I encounter an error: "ValueError: operands could not be broadcast together with shapes (500,) (100,)." This error suggests a broadcasting issue internal to d3rlpy, occurring during dataset creation despite correctly matched data dimensions. It seems to interpret or handle multidimensional data incorrectly, potentially a bug with the library’s handling of input shapes for MDPDataset.

When I select only 1 feature, I'm able to create the MDPDataset with shapes like (100, 1). However, I encounter another error later in the code when I try to use the for example the DDPG model.
The error message states that the DDPG model requires 'config' and 'device' arguments, but based on the documentation, DDPG() does not have these arguments. When I try to use the arguments mentioned in the documentation, I get an 'unexpected keyword argument' error.

Do you think this could be a problem with the library? I already tried several python envrionments and got the same errors.
Library used is d3rlpy-2.4.0.

@bzeni1 Hi, could you share the minimal example that I can reproduce your issue? It sounds like your code is simply incorrect.

btw, when you instantiate algorithms, you need to do as follows:

ddpg = d3rlpy.algos.DDPGConfig().create()

@takuseno Hi, find my code below. What could be the problem? Thank you in advance for your assistance on this matter.

processed_data = race_data.copy()
settings_columns = [ #22 selected coloumns from my dataset ]
processed_data['reward'] = processed_data['ACCELERATION_m_s2']

for col in settings_columns:
    processed_data[f'state_{col}'] = processed_data[col]
    

for col in settings_columns:
    processed_data[f'action_{col}'] = processed_data[col].diff().fillna(0)


for col in settings_columns:
    processed_data[f'next_{col}'] = processed_data[col].shift(-1)

#end of an episode (race)
processed_data['done'] = processed_data['race_num'].diff(-1) != 0

processed_data = processed_data[processed_data['done'] == False]

print("After filtering rows:", processed_data.shape)

print("States shape:", processed_data[settings_columns].shape)
print("Actions shape:", processed_data[[f'action_{col}' for col in settings_columns]].shape)
print("Rewards shape:", processed_data['reward'].shape)
print("Next states shape:", processed_data[[f'next_{col}' for col in settings_columns]].shape)
print("Dones shape:", processed_data['done'].shape)

**Output**
States shape: (17642, 22)
Actions shape: (17642, 22)
Rewards shape: (17642,)
Next states shape: (17642, 22)
Dones shape: (17642,)

states = processed_data[[f'state_{col}' for col in settings_columns if f'state_{col}' in processed_data.columns]].to_numpy()
actions = processed_data[[col for col in processed_data.columns if col.startswith('action_')]].to_numpy()
rewards = processed_data['reward'].to_numpy()
next_states = processed_data[[col for col in processed_data.columns if col.startswith('next_')]].to_numpy()
dones = processed_data['done'].to_numpy()

#next step:

dataset = MDPDataset(states, actions, rewards, next_states, dones)

#ValueError: operands could not be broadcast together with shapes (388124,) (17642,) 

Thanks for sharing your code. It looks like next_states is unnecessary. It needs to be as follows:

dataset = MDPDataset(states, actions, rewards, dones)

Thanks for your advice. By removing next_states I am encountering a new issue:

ValueError: Either episodes or env must be provided to determine signatures. Or specify signatures directly.

However I already defined the segment by the 'done' flags, I still don't know how to determine the episodes. What do you think?

My guess is thatdones is all zeros, thus episodes couldn't be found. You need to correctly setup dones.

Hi

I think I am running into a similar issue. I have 2 datasets. FOr both of them all the dimensions are the same
observations: (5000, 4), actions: (5000, 2), rewards: (5000,), terminals: (5000,)

But with 1 dataset the fit function for IQL fails. Although I am getting a different error. I can see that both datasets have some terminals = 1.
Any suggestions for where an error like this might come up?

`

File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/base.py", line 409, in fit
    results = list(
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/base.py", line 543, in fitter
    loss = self.update(batch)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/base.py", line 863, in update
    loss = self._impl.update(torch_batch, self._grad_step)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/torch_utility.py", line 365, in wrapper
    return f(self, *args, **kwargs)  # type: ignore
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/base.py", line 70, in update
    return self.inner_update(batch, grad_step)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 118, in inner_update
    metrics.update(self.update_critic(batch))
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 84, in update_critic
    loss = self.compute_critic_loss(batch, q_tpn)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/algos/qlearning/torch/iql_impl.py", line 73, in compute_critic
_loss
    q_loss = self._q_func_forwarder.compute_error(
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/q_functions/ensemble_q_function.py", line 256, in
 compute_error
    return compute_ensemble_q_function_error(
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/q_functions/ensemble_q_function.py", line 96, in 
compute_ensemble_q_function_error
    loss = forwarder.compute_error( 
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/q_functions/mean_q_function.py", line 130, in com
pute_error
    value = self._q_func(observations, actions).q_value
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/q_functions/base.py", line 35, in __call__
    return super().__call__(x, action)  # type: ignore
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/q_functions/mean_q_function.py", line 99, in forw
ard
    q_value=self._fc(self._encoder(x, action)),
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/encoders.py", line 41, in __call__
    return super().__call__(x, action)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/d3rlpy/models/torch/encoders.py", line 284, in forward
    return self._layers(x)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/rohan/anaconda3/envs/franka/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (256x6 and 5x256)

`

@rohanblueboybaijal Sorry for the late response. Could you share a minimal example that I can reproduce your error?

Let me close this issue since the initial question should be resolved. Feel free to open a new issue to follow up.