huggingface / jat

General multi-task deep RL Agent

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Aggregate evaluation metrics from different environments/tasks

qgallouedec opened this issue · comments

Training generates evaluation data from several environments/tasks. We want to aggregate all these evaluations to get a statistically sound measure of the results. To do this, we can use rliable, perhaps within a general evaluator?
One difficulty could be the number of times the model has to be trained from scratch.