pathak22 / noreward-rl

[ICML 2017] TensorFlow code for Curiosity-driven Exploration for Deep Reinforcement Learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature normalization ?

unrealwill opened this issue · comments

Hello, I just read the paper today, and there are still two points that remains unclear to me.
I looked at the code to try understanding it better but it still remains not clear.

The first point :
In model.py the features function transforming the input state into feature space are defined in nipsHead, universeHead, ...
In these definitions and their usage, I see no trace of normalization (something like l2 normalize).
I am expecting to see a normalization because it seems very easy for the network to cheat. If it want to maximize the reward, it just have to scale the features up. (And scale down in the inverse model to not be penalized).

The second point :
It seems to me that every time the parameters of the features function are modified, the intrinsic rewards therefore the rewards for the whole episode are modified. Therefore we need to recompute the generalized advantages for the whole episode. Does this mean that we must process episodes in their entirety ? How does it play with experience replay ? Is there an approximation to avoid recomputing the advantages after an update ?

Thanks.