takuseno / d3rlpy

An offline deep reinforcement learning library

Home Page:https://takuseno.github.io/d3rlpy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[QUESTION] difference between evaluators and OPE ?

ericyue opened this issue · comments

I'm build a offline RL model with custom collected logs data.
I'm not sure how to understand trained model performance, one way is to add evaluators such as TDErrorEvaluator , the other way is to train a other d3rlpy.ope.FQE with test dataset to see soft_opc or other metrics.
considering theses two way is all do with the test dataset and calc some metric, which one should I take to value the model?

model = d3rlpy.algos.BCQConfig(xxxx)
ret = model.fit(
train_dataset,
n_steps=N_STEPS,
n_steps_per_epoch=N_STEPS_PER_EPOCH,
logger_adapter=logger_adapter,
save_interval = 10,
evaluators={
'test_td_error': TDErrorEvaluator(episodes=test_dataset.episodes),
'test_value_scale': AverageValueEstimationEvaluator(episodes=test_dataset.episodes),
"test_init_value": InitialStateValueEstimationEvaluator(episodes=test_dataset.episodes),
}
)

@ericyue Hi, thanks for the issue. I would redirect you to the following papers since this is a general offline RL question.

@takuseno I have similar dubts as the one of @ericyue . Could you please elaborate a little bit more, the papers are interesting but very theoretical could you provide more practical example?