ikostrikov / walk_in_the_park

Hi,
in

Line 127 in 40321ec

q = qs.mean(axis=0)

you update the actor based on the mean over all Q functions.
In the SB3 implementation of SAC, the minimum over all Q functions is used https://github.com/DLR-RM/stable-baselines3/blob/5ef10c8e69b52e1376e6c2c636737d6dd528dda1/stable_baselines3/sac/sac.py#L265
Was this a design decision or are both methods viable?
Thanks,
Jakob

Hi Jakob,

It depends on a specific task and setup. In some cases, the mean is better while in the others the min is better.

[Question] Actor updates Q function mean vs min