takuseno / d3rlpy

An offline deep reinforcement learning library

Home Page:https://takuseno.github.io/d3rlpy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Question] Reproducing FQE

risufaj opened this issue · comments

Hi,

First of all, thank you for this great package!

I'm trying to reproduce the plots of the FQE tutorial: https://d3rlpy.readthedocs.io/en/latest/tutorials/offline_policy_selection.html
but I'm not seeing a way how to get the results of epoch 1,5,10. Could you please help me out?

This is the code I'm using, inspired by the code in #365

DIM_ACTION = 1
N_SAMPLES = 100

observations = np.random.random((N_SAMPLES, DIM_OBSERVATION))
actions = np.random.random((N_SAMPLES, DIM_ACTION))
rewards = np.random.random(N_SAMPLES)
terminals = np.random.randint(2, size=N_SAMPLES)

train_dataset = d3rlpy.dataset.MDPDataset(
    observations=observations, 
    actions=actions, 
    rewards=rewards, 
    terminals=terminals,
    action_space=d3rlpy.ActionSpace.CONTINUOUS
)

eval_dataset = train_dataset

model = d3rlpy.algos.IQLConfig().create(device='cpu')
model.fit(
    train_dataset,
    n_steps=N_SAMPLES,
    n_steps_per_epoch=N_SAMPLES,
)

fqe = d3rlpy.ope.FQE(algo=model, config=d3rlpy.ope.FQEConfig())

fqe.fit(
   eval_dataset,
   n_steps=N_SAMPLES,
   evaluators={
      'init_value': d3rlpy.metrics.InitialStateValueEstimationEvaluator(),
      'soft_opc': d3rlpy.metrics.SoftOPCEvaluator(return_threshold=-300),
   },
)

@risufaj Hi, thank you for the issue. Here is a procedure to make the similar plot:

  • Train an agent (say, IQL).
  • Train FQE with the agent of epoch 1, 5, 10 individually.
  • Compare the saved metrics with those individual FQE instances.

@takuseno considering the plot of init_value of https://github.com/takuseno/d3rlpy/blob/c31ad8ab7186198b994383ce9fa2c88af196b6d9/docs/assets/fqe_cartpole_init_value.png

the plot has 3 lines which are epoch 1,5,10. but what's the x axis means? the x is epoch too, I'm wondering how to plot this plot too.

@ericyue I think the epochs in the x-axis of that figure are the epochs of training FQE. So, you first train your model using IQL, keeping the same example as @takuseno. After that you use the agents of epochs 1,5 and 10 to train your FQE and then the plot shows how each of these agents performed when trained again using FQE. But, hopefully @takuseno also can reply to this.

@risufaj Thank you for the explanation. That's exactly right!

got it , thanks @risufaj

Let me close this issue since the original problem has been resolved. Feel free to reopen this if there is any further discussion.