takuseno / d3rlpy

An offline deep reinforcement learning library

Home Page:https://takuseno.github.io/d3rlpy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[REQUEST] Allow dynamic neural architectures to be used

johnHostetter opened this issue · comments

Hi there! Wonderful library and want to say thank you for all that you do by maintaining this repository.

Is your feature request related to a problem? Please describe.
I research self-organizing and morphetic neural architectures, and apply these to offline RL settings. I enjoy using your library but find it is difficult to incorporate the dynamic nature of these networks. Mainly, when you dynamically add new tunable parameters to a (policy/target) network, their associated optimizer does not reference these newly added tunable parameters.

Describe the solution you'd like
A clean solution to re-initialize the optimizer if new weights/parameters have been added during the .fit() call. Currently, my existing method is very "hack-y" and messy. I also have to use a separate fork where I override some code that prevents modification (such as frozen dataclasses). Further, if the policy and target networks' parameters do not perfectly align, the hard_sync method will throw an error; instead in my work I use a "sync if possible" approach, where the intersection of these parameters is synced, otherwise the networks' weights are unioned together.

Additional context
Although dynamically adding new weights/parameters to a network may see a drop in performance, it is possible to do so without causing this: https://proceedings.mlr.press/v48/wei16.html

I can also try to do this by submitting a PR if you could point me in the right direction or suggest changes you would be comfortable with me making. As it stands right now, my current code that does this is not too great and likely breaks features in the library I do not use in my work. I was hoping you might be able to suggest ideas how to easily and cleanly do this instead - thanks for your time!

@johnHostetter Hi, thank you for your request! Supporting the dynamically changing architecture is tricky indeed. One direction I can propose is that we could design a dedicated algorithm to support this use case so that we can reduce the shared parts of codes.

Hi @takuseno, thank you for your reply! Sorry for my delayed response - I am preparing to propose my dissertation work in the coming month.

I think a dedicated algorithm could work but may cause verbose or redundant algorithm definitions. Perhaps it might be possible to instead delicate it such that if the model (e.g., DQN) is identified as an instance of DynamicModule (or whatever name is appropriate), the algorithm would then allow the relevant behavior to dynamically modify its architecture; this could make use of the existing callbacks in the algorithms' .fit implementations. More specifically, the library's existing EncoderFactory could possibly be expanded to produce these dynamic networks automatically. Otherwise, if it is simply just a regular torch.nn.Module, then no special dynamic architecture handling is required (keep the implementation as-is now).

So my proposed solution would be that we create a DynamicEncoderFactory that generates a special torch.nn.Module which is then recognized as triggering certain default callbacks to enable the growing/shrinking of the model's parameters recognized by the optimizer. If this sounds like something you would be interested in, I can work on an implementation for this with minimal code footprint so as to not disrupt the library's structure.