huggingface / lighteval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature: Checkpointing on task level.

PhilipMay opened this issue · comments

I would like to request / suggest the following new feature:

Background

When I use cheap Azure low-priority instances or AWS spot instances they might be preempted. If this happens the evaluation must restart from the beginning.

New Feature

It would be cool to write a "checkpoint" for every task. So if multiple tasks are evaluated like with
open_llm_leaderboard_tasks then it can load tasks that already have been evaluated...
And we do not have to restart at the beginning.

Hi!
This would be very hard to do, as, for efficiency purposes, we do inference for all requests of the same types in batch, and then only do metrics computations - such a system would require us to rewrite the entirety of the code base while losing overall speed performance.
I don't think we will consider it - I suggest you launch evaluations on one task at a time if you have such needs.

Ok. So lets close this again?