n_steps dqn performs worse. bug?
davidenitti opened this issue · comments
Davide Nitti commented
Describe the bug
I modified DQN to enable n_steps DQN, but I get worse results, am I missing something?
To Reproduce
use dqn with this function and defining self.n_steps in the init:
def _fit_standard(self, dataset):
self._replay_memory.add(dataset, n_steps_return=self.n_steps, gamma=self.mdp_info.gamma)
if self._replay_memory.initialized:
state, action, reward, next_state, absorbing, _ = \
self._replay_memory.get(self._batch_size())
if self._clip_reward:
reward = np.clip(reward, -1, 1)
q_next = self._next_q(next_state, absorbing)
gamma = self.mdp_info.gamma ** self.n_steps * (1 - absorbing)
q = reward + gamma * q_next
self.approximator.fit(state, action, q, **self._fit_params)
Expected behavior
dqn with 2 or 3 steps is worse than 1 step dqn for atari breakout and lunar, I'm not sure if it's a bug or if it's supposed to be worse. in any case it would be nice to have the dqn n_steps implemented in mushroom_rl
System information (please complete the following information):
- Mushroom version 1.9
thanks