[Question] Reproducing FQE

Question

[Question] Reproducing FQE

risufaj opened this issue 7 months ago · comments

Hi,

First of all, thank you for this great package!

I'm trying to reproduce the plots of the FQE tutorial: https://d3rlpy.readthedocs.io/en/latest/tutorials/offline_policy_selection.html
but I'm not seeing a way how to get the results of epoch 1,5,10. Could you please help me out?

This is the code I'm using, inspired by the code in #365

DIM_ACTION = 1
N_SAMPLES = 100

observations = np.random.random((N_SAMPLES, DIM_OBSERVATION))
actions = np.random.random((N_SAMPLES, DIM_ACTION))
rewards = np.random.random(N_SAMPLES)
terminals = np.random.randint(2, size=N_SAMPLES)

train_dataset = d3rlpy.dataset.MDPDataset(
    observations=observations, 
    actions=actions, 
    rewards=rewards, 
    terminals=terminals,
    action_space=d3rlpy.ActionSpace.CONTINUOUS
)

eval_dataset = train_dataset

model = d3rlpy.algos.IQLConfig().create(device='cpu')
model.fit(
    train_dataset,
    n_steps=N_SAMPLES,
    n_steps_per_epoch=N_SAMPLES,
)

fqe = d3rlpy.ope.FQE(algo=model, config=d3rlpy.ope.FQEConfig())

fqe.fit(
   eval_dataset,
   n_steps=N_SAMPLES,
   evaluators={
      'init_value': d3rlpy.metrics.InitialStateValueEstimationEvaluator(),
      'soft_opc': d3rlpy.metrics.SoftOPCEvaluator(return_threshold=-300),
   },
)

Takuma Seno · Answer 1 · Sat Feb 17 2024 14:44:19 GMT+0800 (China Standard Time)

@risufaj Hi, thank you for the issue. Here is a procedure to make the similar plot:

Train an agent (say, IQL).
Train FQE with the agent of epoch 1, 5, 10 individually.
Compare the saved metrics with those individual FQE instances.

mooncake · Answer 2 · Mon Feb 19 2024 20:44:18 GMT+0800 (China Standard Time)

@takuseno considering the plot of init_value of https://github.com/takuseno/d3rlpy/blob/c31ad8ab7186198b994383ce9fa2c88af196b6d9/docs/assets/fqe_cartpole_init_value.png

the plot has 3 lines which are epoch 1,5,10. but what's the x axis means? the x is epoch too, I'm wondering how to plot this plot too.

Ralvi Isufaj · Answer 3 · Mon Feb 19 2024 21:11:34 GMT+0800 (China Standard Time)

@ericyue I think the epochs in the x-axis of that figure are the epochs of training FQE. So, you first train your model using IQL, keeping the same example as @takuseno. After that you use the agents of epochs 1,5 and 10 to train your FQE and then the plot shows how each of these agents performed when trained again using FQE. But, hopefully @takuseno also can reply to this.

Takuma Seno · Answer 4 · Mon Feb 19 2024 21:16:05 GMT+0800 (China Standard Time)

@risufaj Thank you for the explanation. That's exactly right!

mooncake · Answer 5 · Mon Feb 19 2024 23:11:59 GMT+0800 (China Standard Time)

got it , thanks @risufaj

Takuma Seno · Answer 6 · Sun Mar 03 2024 19:43:59 GMT+0800 (China Standard Time)

Let me close this issue since the original problem has been resolved. Feel free to reopen this if there is any further discussion.