`TrajectoryCollector` with discount masking if terminal
RobertTLange opened this issue · comments
Robert Tjarko Lange commented
Write a class that collects trajectories and returns a NamedTuple
of collected data. This should include a buffer of state transition tuples (s_t, a_t, s_t_1, r_t, d_t
). Problem: How to make general enough that different stats can also be stored (e.g. log_prob). Make agent return these in actor_step
?