Farama-Foundation / MO-Gymnasium

Multi-objective Gymnasium environments for reinforcement learning

Home Page:http://mo-gymnasium.farama.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Analysis MO-Hopper reward vector

Kallinteris-Andreas opened this issue · comments

This analysis is theoretical and backed up by tests

The multi-objective hopper's reward vector "Contains three elements"

$r_{forward}$
$c_{control}$
Height (instead of $r_{healhty}$)

I assume 3. Was done as a proxy to $r_{healthy}$ Because in environment version 4 (and earlier) the $r_{healthy}$
It was bugged https://github.com/Farama-Foundation/Gymnasium/issues/526, otherwise the hopper could not learn to balance

This has been fixed in version 5 And perhaps it will work with $r_{healthy}$ instead of height.

This is important because it will indicate how more complex environments should be designed, Like the ant and humanoid.(who have a healthy reward)

@LucasAlegre

But I see we have the exact same bug in the original environments, the reward is always given in the last time step, even if the agent is unhealthy.

We will release our v5 of the environments as soon as Gymnasium v1.0 is out. Thanks!

Do you want to keep the torso's height as a reward element?

Do you want to keep the torso's height as a reward element?

Yes, the idea is that then you can have a range of policies that trade-off between jumping forward (x-axis) vs. jumping higher (z-axis).

Is your goal

1)to to learn a policy that maximizes Gymnasium/hopper return.
2) Or to learn a policy that maximizes a different return.

Because if it is the first The current reward vector does not make sense.

When the weight assigned to the third reward component is greater than zero, it is indeed a different return. The gymnasium's return is recovered when the weight assigned to the third objective is zero. Our goal in MORL is to learn policies for any linear combination of these three rewards.

With #92 you explain the mapping,
so I am closing this issue