SforAiDl / genrl

A PyTorch reinforcement learning library for generalizable and reproducible algorithm implementations with an aim to improve accessibility in RL

Home Page:https://genrl.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CNN parameters not being updated for ACs

hades-rp2010 opened this issue · comments

The weights of the CNN layers dont get updated in the CNN AC classes. Formally, this is what I mean -

The CNN actor critic class -

class CNNActorCritic(BaseActorCritic):
    def __init__(args)
        super(CNNActorCritic, self).__init__()
        self.feature, output_size = cnn((framestack, 16, 32))
        self.actor = MlpPolicy(
            output_size, action_dim, policy_layers, discrete, **kwargs
        )
        self.critic = MlpValue(output_size, action_dim, val_type, value_layers)

    def get_action(
        self, state: torch.Tensor, deterministic: bool = False
    ) -> torch.Tensor:
        state = self.feature(state)
        state = state.view(state.size(0), -1)

        action_probs = self.actor(state)
        action_probs = nn.Softmax(dim=-1)(action_probs)
        # Som lines deleted
        return action, distribution

    def get_value(self, inp: torch.Tensor) -> torch.Tensor:
        inp = self.feature(inp)
        inp = inp.view(inp.size(0), -1)

        value = self.critic(inp).squeeze(-1)
        return value

Above, self.feature is the nn.Sequential object to be used (The CNN)
But the optimizers in the agents -

self.optimizer_policy = opt.Adam(self.ac.actor.parameters(), lr=self.lr_policy)
self.optimizer_value = opt.Adam(self.ac.critic.parameters(), lr=self.lr_value)

The self.ac.actor.parameters() and self.ac.critic.parameters() are just the params of the actor head and the critic head. So what about self.ac.feature.parameters()? That does not seem to be updated anywhere

I think feature params should be added to both. Not sure though

Yeah, checked by printing out the weights. I'll add this too in my current PR.
I'll close this for now?

#307 Takes care of this
Closing this for now