load model retraining consumes exponential time

Question

load model retraining consumes exponential time

ccccccc17 opened this issue 7 months ago · comments

I am doing a continuing learning framework with three tasks. And I noticed that each time I load the model and keep training in new tasks, it takes much longer than training from scratch. I am wondering why this happens and how to accelerate it.
Thanks!

Edouard Leurent · Answer 1 · Sat Jan 13 2024 23:38:56 GMT+0800 (China Standard Time)

That's a good question... Are you training for a given number of steps, or episodes? If the latter, one hypothesis: when you load a pretrained model, the agent does a better job at avoiding collisions (compared to a newly initialised model), which means that episodes will last longer and take more time to simulate. Wdyt?

Jing Cui · Answer 2 · Sun Jan 14 2024 16:23:19 GMT+0800 (China Standard Time)

I use the model. learn() function to pass the hyper-parameters, where the total learning steps are 2e5 for each subtask.

Jing Cui · Answer 3 · Mon Jan 15 2024 21:21:54 GMT+0800 (China Standard Time)

Btw, I got the user warning saying that 'a Box observation space has an unconventional shape'. Do you think I should be worried? I am using the dqn_cnn scripts to run a highway-fast env.

Edouard Leurent · Answer 4 · Wed Jan 17 2024 03:58:26 GMT+0800 (China Standard Time)

The GrayscaleImage observation has so-called "unconventional shape" because it's [W, H] only and SB3 expects [W, H, C] for an image. I don't think it is worrying as long as it's handled correctly in SB3, but I cant say I remember exactly what they do. If they do nothing and feed the observation to a CNN directly it's probably fine.

Jing Cui · Answer 5 · Wed Jan 31 2024 14:20:37 GMT+0800 (China Standard Time)

I got a new one. Sorry about occupying so much time. I am trying to retrieve evaluation statistics, for instance, episode rewards. What are some hints about that? Or maybe you have a code snippet I could learn from.

Edouard Leurent · Answer 6 · Sat Feb 10 2024 22:44:30 GMT+0800 (China Standard Time)

the rewards are obtained at every environment step:

reward, obs, done, truncated, info = env.step(action)

You can sum them across timesteps until a terminal state is reached to get the episode rewards.