n_steps dqn performs worse. bug?

Question

n_steps dqn performs worse. bug?

davidenitti opened this issue 2 years ago · comments

Describe the bug
I modified DQN to enable n_steps DQN, but I get worse results, am I missing something?

To Reproduce
use dqn with this function and defining self.n_steps in the init:

def _fit_standard(self, dataset):
       self._replay_memory.add(dataset, n_steps_return=self.n_steps, gamma=self.mdp_info.gamma)
       if self._replay_memory.initialized:
           state, action, reward, next_state, absorbing, _ = \
               self._replay_memory.get(self._batch_size())

           if self._clip_reward:
               reward = np.clip(reward, -1, 1)

           q_next = self._next_q(next_state, absorbing)
           gamma = self.mdp_info.gamma ** self.n_steps * (1 - absorbing)
           q = reward + gamma * q_next

           self.approximator.fit(state, action, q, **self._fit_params)

Expected behavior
dqn with 2 or 3 steps is worse than 1 step dqn for atari breakout and lunar, I'm not sure if it's a bug or if it's supposed to be worse. in any case it would be nice to have the dqn n_steps implemented in mushroom_rl

System information (please complete the following information):

Mushroom version 1.9

thanks