Off Policy Trainer

Question

Sharad24 opened this issue 4 years ago · comments

Off Policy Trainer does not depend on epochs. Has dependence on max_timesteps which even in the worst case should not be without epochs

When working on this, also remove the max_timesteps=100 from every off policy agent test thats added for now to reduce the test times in #368