[Question] Reproducing FQE
risufaj opened this issue · comments
Hi,
First of all, thank you for this great package!
I'm trying to reproduce the plots of the FQE tutorial: https://d3rlpy.readthedocs.io/en/latest/tutorials/offline_policy_selection.html
but I'm not seeing a way how to get the results of epoch 1,5,10. Could you please help me out?
This is the code I'm using, inspired by the code in #365
DIM_ACTION = 1
N_SAMPLES = 100
observations = np.random.random((N_SAMPLES, DIM_OBSERVATION))
actions = np.random.random((N_SAMPLES, DIM_ACTION))
rewards = np.random.random(N_SAMPLES)
terminals = np.random.randint(2, size=N_SAMPLES)
train_dataset = d3rlpy.dataset.MDPDataset(
observations=observations,
actions=actions,
rewards=rewards,
terminals=terminals,
action_space=d3rlpy.ActionSpace.CONTINUOUS
)
eval_dataset = train_dataset
model = d3rlpy.algos.IQLConfig().create(device='cpu')
model.fit(
train_dataset,
n_steps=N_SAMPLES,
n_steps_per_epoch=N_SAMPLES,
)
fqe = d3rlpy.ope.FQE(algo=model, config=d3rlpy.ope.FQEConfig())
fqe.fit(
eval_dataset,
n_steps=N_SAMPLES,
evaluators={
'init_value': d3rlpy.metrics.InitialStateValueEstimationEvaluator(),
'soft_opc': d3rlpy.metrics.SoftOPCEvaluator(return_threshold=-300),
},
)
@risufaj Hi, thank you for the issue. Here is a procedure to make the similar plot:
- Train an agent (say, IQL).
- Train FQE with the agent of epoch 1, 5, 10 individually.
- Compare the saved metrics with those individual FQE instances.
@takuseno considering the plot of init_value of https://github.com/takuseno/d3rlpy/blob/c31ad8ab7186198b994383ce9fa2c88af196b6d9/docs/assets/fqe_cartpole_init_value.png
the plot has 3 lines which are epoch 1,5,10. but what's the x axis means? the x is epoch too, I'm wondering how to plot this plot too.
@ericyue I think the epochs in the x-axis of that figure are the epochs of training FQE. So, you first train your model using IQL, keeping the same example as @takuseno. After that you use the agents of epochs 1,5 and 10 to train your FQE and then the plot shows how each of these agents performed when trained again using FQE. But, hopefully @takuseno also can reply to this.
@risufaj Thank you for the explanation. That's exactly right!
Let me close this issue since the original problem has been resolved. Feel free to reopen this if there is any further discussion.