huggingface / lighteval

LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The helm|piqa task is generative but has generation_size=-1.

yonatano opened this issue · comments

The helm|piqa task listed in tasks_table.jsonl here: https://github.com/huggingface/lighteval/blob/a98210fd3a2d1e8bface1c32b72ebd5017173a4c/src/lighteval/tasks/tasks_table.jsonl#L797C1-L797C472.

has "generation_size":-1 even though "metric":["exact_match"... which are mutually exclusive.

For example, this command fails for me --

accelerate launch --multi_gpu --num_processes=8 run_evals_accelerate.py \
    --model_args "pretrained=gpt2" \
    --tasks "helm|piqa|0|1" \
    --override_batch_size 1 \
    --output_dir="./evals/"

with error:

ValueError: `max_new_tokens` must be greater than 0, but is -1.

Thanks.

Hi! This sounds like an error on our side! If you have the time, could you take a look at the helm code base to see which generation size should be used?