christianhidber / easyagents

Reinforcement Learning for Practitioners.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is it possible to resume training with a saved model?

oortlieb opened this issue · comments

I'm training a model in an environment where episodes are fairly slow -- each one takes around 5 seconds of real time. I'm currently saving models out with the save.Best() callback.

Is there a way to load the saved model and continue training it? I'm trying to load it back in via the agents.load function, but the behavior of the agent makes it seem like training is starting over from scratch. Here's the code I'm using to try to load the model:

if __name__ == "__main__":
    register_with_gym(gym_env_name='MyEnvV0', entry_point=MyEnvV0)

    checkpoint_dir = None
    if len(sys.argv) > 1:
        checkpoint_dir = sys.argv[1]

    agent = None
    if checkpoint_dir is not None:
        agent = agents.load(checkpoint_dir)
    else:
        agents.seed = 0
        agent = PpoAgent(
            'MyEnvV0'
        )

    print(agent)

    agent.train([save.Best("./models"), plot.State(), plot.Loss(), plot.Rewards(), plot.Actions(), log.Iteration()],
        learning_rate = 0.0001,
        num_iterations = 1000000,
        max_steps_per_episode=1000
    )

Hi Oliver

Yes you are right, when you load a previously saved model and start training it will start from scrach. We had a similar issue a few weeks ago where an epsiode could run even for a few minutes. Currently you can not easily continue training. To do that I guess you would have to make sure that in general all the training state is restored depending on the algorithm and the underlying library. But I fully agree, that would be a very convenient feature.

By the way - if you don't mind and can talk about it - on what kind of problem are you applying RL to ? I am always very interested in hearing about current use cases.

All the best

I'm new to RL, but have a couple of specific applications that I'm interested in:

  1. Optimization problems - Building a MDP/gym description of an optimization problem seems like a promising way to avoid designing and tuning heuristics (which takes a lot of time, especially when it's a new domain). My first RL "toy project" was a variant of the 2D bin packing problem. I was able to get some decent results (with easyagents!), though I don't think I did everything quite correctly -- there is definitely room for improvement in even the best of the learned policies.
  2. Video games - The classic answer :) I'm interested in creating a strategy game where a player can only control a subset of the characters on their team, and am looking to avoid hand-tuning behaviors. I don't really like using Unity, so I've been trying to piece together an RL toolchain with Godot and Godot Open AI Gym (https://github.com/lupoglaz/GodotAIGym). I haven't had much success training agents inside of Godot yet.

splendid, that sounds exciting. all the best.