EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Home Page:https://www.eleuther.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

accelerate doesn't work with auto:(>1)

ozgurcelik opened this issue · comments

Hi. I realized that accelerate launch works perfectly when I set batch_size = "auto" but gets stuck at the very end when I use batch_size = "auto:2". The problem persists whether I use evaluator.simple_evaluate or terminal call accelerate launch -m lm_eval. This is a problem since different tasks may have different batch_sizes.

Passed argument batch_size = auto:2.0. Detecting largest batch size Running loglikelihood requests: 0%| | 0/1644 [00:00<?, ?it/s]Passed argument batch_size = auto:2.0. Detecting largest batch size Passed argument batch_size = auto:2.0. Detecting largest batch size Passed argument batch_size = auto:2.0. Detecting largest batch size Determined largest batch size: 16 Determined largest batch size: 16 Determined largest batch size: 16 Determined largest batch size: 16 Running loglikelihood requests: 40%|███████████████████████████████████████████▌ | 657/1644 [00:27<00:26, 37.06it/s]Passed argument batch_size = auto:2.0. Detecting largest batch size Passed argument batch_size = auto:2.0. Detecting largest batch size Running loglikelihood requests: 41%|████████████████████████████████████████████▌ | 673/1644 [00:28<00:26, 37.21it/s]Passed argument batch_size = auto:2.0. Detecting largest batch size Running loglikelihood requests: 42%|█████████████████████████████████████████████▋ | 689/1644 [00:28<00:25, 37.21it/s]Passed argument batch_size = auto:2.0. Detecting largest batch size Determined largest batch size: 16 Determined largest batch size: 16 Determined largest batch size: 16 Determined largest batch size: 16 Running loglikelihood requests: 90%|████████████████████████████████████████████████████████████████████████████████████████████████▏ | 1478/1644 [00:41<00:00, 209.33it/s]Passed argument batch_size = auto:2.0. Detecting largest batch size Running loglikelihood requests: 97%|████████████████████████████████████████████████████████████████████████████████████████████████████████ | 1598/1644 [00:41<00:00, 234.70it/s]Passed argument batch_size = auto:2.0. Detecting largest batch size Running loglikelihood requests: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1644/1644 [00:41<00:00, 39.17it/s] Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [00:01<00:00, 3971.36 examples/s] Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [00:01<00:00, 3402.27 examples/s]

This is how it looks like when it gets stuck with auto:2. It unnecessarily tries to find optimum batch size near very end and never finishes task.

You can just use auto, and refer here https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/interface.md
auto:2 means search batch size twice 😅

Correct me if I am wrong but as we go down in the evaluation, the sample length may get shorter. So maybe we can fit more samples down the line. I was using auto:2 for such cases, precisely because I want to search the max batch size once again.

I also found this problem. After all loglikelihood requests are finished, the process hangs with no other outputs and CPU/GPU are full.
Mistral-7B-v0.1 on MMLU with auto:4 meets this problem, while on hellaswag with auto:4 not. Replace auto:4 with auto solves.
I believe there is a bug.

Hi! I'll look into this--suspect padding across ranks is slightly off somewhere, or else the batch sizes get unsynced.

@ozgurcelik --do you have a sample command which exhibits this problem?