Add A2C implementation
RobertTLange opened this issue · comments
Robert Tjarko Lange commented
Reminder todo after internship.
Mostly for meta-bandit and gridworld tasks
David Slayback commented
Suggestion: you could probably just implement it as PPO with fixed parameters (gae=1, no advantage normalization, 1 epoch, 1 minibatch, no value clipping) as per "A2C is a Special Case of PPO"
Robert Tjarko Lange commented
Good point, I didn't know about this equivalence. For the meta-RL setups I may have to write some extra logic but will try to keep things minimal.