RobertTLange / gymnax-blines

Baselines for gymnax 🤖

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add A2C implementation

RobertTLange opened this issue · comments

Reminder todo after internship.

Mostly for meta-bandit and gridworld tasks

Suggestion: you could probably just implement it as PPO with fixed parameters (gae=1, no advantage normalization, 1 epoch, 1 minibatch, no value clipping) as per "A2C is a Special Case of PPO"

Good point, I didn't know about this equivalence. For the meta-RL setups I may have to write some extra logic but will try to keep things minimal.