`TrajectoryCollector` with discount masking if terminal

Question

`TrajectoryCollector` with discount masking if terminal

RobertTLange opened this issue 3 years ago · comments

Robert Tjarko Lange commented 3 years ago

Write a class that collects trajectories and returns a NamedTuple of collected data. This should include a buffer of state transition tuples (s_t, a_t, s_t_1, r_t, d_t). Problem: How to make general enough that different stats can also be stored (e.g. log_prob). Make agent return these in actor_step?