Feature: Checkpointing on task level.

Question

Feature: Checkpointing on task level.

PhilipMay opened this issue 2 months ago · comments

I would like to request / suggest the following new feature:

Background

When I use cheap Azure low-priority instances or AWS spot instances they might be preempted. If this happens the evaluation must restart from the beginning.

New Feature

It would be cool to write a "checkpoint" for every task. So if multiple tasks are evaluated like with
open_llm_leaderboard_tasks then it can load tasks that already have been evaluated...
And we do not have to restart at the beginning.

Clémentine Fourrier · Answer 1 · Sat Apr 20 2024 02:09:51 GMT+0800 (China Standard Time)

Hi!
This would be very hard to do, as, for efficiency purposes, we do inference for all requests of the same types in batch, and then only do metrics computations - such a system would require us to rewrite the entirety of the code base while losing overall speed performance.
I don't think we will consider it - I suggest you launch evaluations on one task at a time if you have such needs.

Philip May · Answer 2 · Sat Apr 20 2024 03:31:46 GMT+0800 (China Standard Time)

Ok. So lets close this again?