Harness does not work properly

Question

Harness does not work properly

RobinJing opened this issue 7 months ago · comments

RobinJing commented 7 months ago

Describe the bug
Wanna use Harness but does not work.
How to reproduce
Steps to reproduce the error:

Using Conda:

conda create with python 3.11
conda activate& git clone harnesss
pip install -e .

2.Using Docker b11:
pip install -e . successfully
python run_multi_llb.py --model ipex-llm --pretrained /model/DeepSeek-R1-Distill-Qwen-32B --precision sym_int4 --device xpu:0,1,2,3 --tasks mmlu --batch 1 --no_cache
pip install datasets==2.21.0
python run_multi_llb.py --model ipex-llm --pretrained /model/DeepSeek-R1-Distill-Qwen-32B --precision sym_int4 --device xpu:0,1,2,3 --tasks mmlu --batch 1 --no_cache
pip install accelerate==0.26.0 & python run_multi_llb.py again

-pip install trl==0.11.0 & python run_multi_llb.py again

RobinJing · Answer 1 · Mon Mar 31 2025 13:23:49 GMT+0800 (China Standard Time)

For the conda environment, I have setup the environment and the deployment is ready to go, after this , I can start the script with:
python run_multi_llb.py --model ipex-llm --pretrained /model/DeepSeek-R1-Distill-Qwen-32B --precision sym_int4 --device xpu:0,1,2,3 --tasks hellaswag --batch 1 --no_cache
but the program stops suddenly without any meaningful output:

Wang, Jian4 · Answer 2 · Mon Mar 31 2025 14:12:48 GMT+0800 (China Standard Time)

Maybe you can try to run it again using latest b16 image, and we don't need to install conda again in docker container, you can use pip install directly to test.

RobinJing · Answer 3 · Mon Mar 31 2025 14:37:47 GMT+0800 (China Standard Time)

Hi,
I use b16 image docker to perform the test, simply I start the backend server from vllm, and use harness 'local-completions' as the frontend, when I run the winogrande task, it works fine; however if I run mmlu task, the vllm server crashes:

Kai Huang · Answer 4 · Mon Mar 31 2025 15:02:27 GMT+0800 (China Standard Time)

If you want to run multi-card for one single large model, refer to this guide: https://github.com/intel/ipex-llm/tree/main/python/llm/dev/benchmark/ceval#multi-gpu-environment

run_multi_llb.py is supposed to run multiple tasks across multi cards and one model & task on a single card.

Wang, Jian4 · Answer 5 · Mon Mar 31 2025 15:21:43 GMT+0800 (China Standard Time)

Hi, I use b16 image docker to perform the test, simply I start the backend server from vllm, and use harness 'local-completions' as the frontend, when I run the winogrande task, it works fine; however if I run mmlu task, the vllm server crashes:

And this error looks like an OOM error, can you provide the entile steps on how to run vllm and how to reproduce this issue?

Kai Huang · Answer 6 · Mon Mar 31 2025 15:26:24 GMT+0800 (China Standard Time)

Synced offline, may check the memory usage of each card to confirm if the issue is OOM and paste your environment/steps as Jian suggested.