Confusion: Why are you adding 1 here?

Question

Confusion: Why are you adding 1 here?

AdityaGudimella opened this issue 5 years ago · comments

When calculating the scaled log_std in SAC policy, you scale log_std + 1 to the range [LOG_STD_MIN, LOG_STD_MAX]. Is this because the range of the tanh function is [-1, 1]?
Is it really necessary? Wouldn't the scaling limit the output range to [LOG_STD_MIN, LOG_STD_MAX] even without that?

https://github.com/chutaklee/firedup/blob/ed3634525703f3169b190f6e7951d69c38a5372d/fireup/algos/sac/core.py#L92-L93

Kashif Rasul · Answer 1 · Mon Sep 16 2019 17:42:04 GMT+0800 (China Standard Time)

yes so in the end i want a linear transformation which maps (-1, 1) to (LOG_STD_MIN, LOG_STD_MAX) and since this transformation is unique any other form of it will turn out to be the same formula as I have...

The way I, and the original author I suppose, thought of it was to scale (-1, 1) -+1 -> (0, 2) - *.5 -> (0, 1) -> stretch it -> (LOG_STD_MIN, LOG_STD_MAX). Hope that helps!