[QUESTION] difference between evaluators and OPE ?

Question

[QUESTION] difference between evaluators and OPE ?

ericyue opened this issue 5 months ago · comments

I'm build a offline RL model with custom collected logs data.
I'm not sure how to understand trained model performance, one way is to add evaluators such as TDErrorEvaluator , the other way is to train a other d3rlpy.ope.FQE with test dataset to see soft_opc or other metrics.
considering theses two way is all do with the test dataset and calc some metric, which one should I take to value the model?

model = d3rlpy.algos.BCQConfig(xxxx)
ret = model.fit(
train_dataset,
n_steps=N_STEPS,
n_steps_per_epoch=N_STEPS_PER_EPOCH,
logger_adapter=logger_adapter,
save_interval = 10,
evaluators={
'test_td_error': TDErrorEvaluator(episodes=test_dataset.episodes),
'test_value_scale': AverageValueEstimationEvaluator(episodes=test_dataset.episodes),
"test_init_value": InitialStateValueEstimationEvaluator(episodes=test_dataset.episodes),
}
)

Takuma Seno · Answer 1 · Sun Mar 03 2024 19:41:51 GMT+0800 (China Standard Time)

@ericyue Hi, thanks for the issue. I would redirect you to the following papers since this is a general offline RL question.

LorenzoBottaccioli · Answer 2 · Fri May 03 2024 18:18:27 GMT+0800 (China Standard Time)

@takuseno I have similar dubts as the one of @ericyue . Could you please elaborate a little bit more, the papers are interesting but very theoretical could you provide more practical example?