[Question] Observation problems in Pendulum-v1
Ian-Sy-Zhang opened this issue · comments
Question
From Document of Gymnasium we can know that:
the 0th item in Observation Space is 'x = cos(theta)'
the 1st item in Observatin Space is 'y = sin(angle)'
I didn't see anything in the document saying that 'theta' and 'angle' are two different things.
If theta is the same thing with angle, then x^2 + y^2 should be equal to 1.
import gymnasium as gym
import numpy as np
env = gym.make('Pendulum-v1')
incorrect_count = 0
for _ in range(100):
state = env.observation_space.sample()
cos_theta = state[0]
sin_theta = state[1]
sum_of_squares = cos_theta**2 + sin_theta**2
print(f"Sum of squares: {sum_of_squares}")
if np.isclose(sum_of_squares, 1.0, atol=0.1):
print("Sample is correct.")
else:
print("Sample is incorrect.")
incorrect_count += 1
print(incorrect_count)
The result shows that in 100 samples, 78 are incorrect.
So the questions are:
- Is 'theta' the same defination of 'angle' in the document?
- If the answer of question1 is 'yes', then why sin(\theta)^2 + cos(\theta)^2 != 1?
- If the answer of question1 & question2 is 'yes', is there any problems in the
sample
function?
- Yes, theta is the same as angle in the documentation
2 and 3. To generate an observation you are usingenv.observation_space.sample()
however all this produces is a possible observation within the bounds, not necessarily a valid observation for the environment. Therefore, it doesn't necessarily generate an observation that follows the trig identity function.
Correct code
env = gym.make("Pendulum-v1")
obs, _ = env.reset()
assert np.isclose(obs[0]**2 + obs[1]**2, 1)
for _ in range(100):
action = env.action_space.sample()
obs, _, _, _, _ = env.step(env.action_space.sample())
assert np.isclose(obs[0]**2 + obs[1]**2, 1)
May I ask what rewards make the best convergence? Mine using A3C found it hard to surpass -200 (for episodes no more than 200 steps).
Pendulum is a difficult exploration problem such that you might need to explore the environment more