rll-research / url_benchmark

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Encoder updating in ICM implementations

xf-zhao opened this issue · comments

Hi, thank you all for this remarkable work. I found the codes are very well constructed.

I have one question about the implementations of ICM. I noticed that the encoder is only updated according to the loss of forward+inverse prediction model, and is not updated when critic networks udpate (since obs is detached when calling self.update_critic), though there is a parameter update_encoder=True that should control the behaviour (see url_benchmark/agent/icm.py, line 118-125, also as below).

        if not self.update_encoder:
            obs = obs.detach()
            next_obs = next_obs.detach()

        # update critic
        metrics.update(
            self.update_critic(obs.detach(), action, reward, discount,
                               next_obs.detach(), step))

I guess it is a choice after testing with it on and off? But if so then it will raise another question: the encoder is trained during pretraining procedure, but the one which randomly initialized ("random init" in the paper) used is not. So when comparing them, we cannot say that the representations learned using ICM is better than from random exploration.

Thank you in advance!

I also have same question. DDPG updates encoder when training critic. but APT-ICM trained encoder when training only ICM. in my points of view, It looks not enough..