Off Policy Trainer
Sharad24 opened this issue · comments
Sharad Chitlangia commented
Off Policy Trainer does not depend on epochs. Has dependence on max_timesteps
which even in the worst case should not be without epochs
When working on this, also remove the max_timesteps=100
from every off policy agent test thats added for now to reduce the test times in #368