all-in-one benchmark llama-3-8b-instruct issue with version 2.1.0b1

Question

all-in-one benchmark llama-3-8b-instruct issue with version 2.1.0b1

Fred-cell opened this issue a month ago · comments

batch 1, and 1024-512, it hung as below:
THE MYSTERY OF THE CITY](9781441125608_epub_itb-ch5.xhtml)
The man's journey took him to the heart of the city, where he discovered a hidden underground chamber filled with ancient artifacts and mysterious symbols. He spent hours studying the symbols, trying to decipher their meaning and unlock the secrets they held.
As he delved deeper into the chamber, he began to uncover a hidden history of the city, one that was shrouded in mystery and secrecy. He discovered that the city was built on an ancient site, one that was said to hold the power of the gods.
The man's journey took him to the city's ancient temple, where he discovered a hidden chamber filled with ancient artifacts and mysterious symbols. He spent hours studying the symbols, trying to decipher their meaning and unlock the secrets they held.
As he delved deeper into the chamber, he began to uncover a hidden history of the city, one that was shrouded in mystery and secrecy. He discovered that the city was built on an ancient site, one that was said to hold the power of the gods.
The man's journey took him to the city's ancient temple, where he discovered a hidden chamber filled with ancient artifacts and mysterious symbols. He spent hours studying the symbols, trying to decipher their meaning and unlock the secrets they held.
As he delved deeper into the chamber, he began to uncover a hidden history of the city, one that was shrouded in mystery and secrecy. He discovered that the city was built on an ancient site, one that was said to hold the power of the gods.
The man's journey took him to the city's ancient temple, where he discovered a hidden chamber filled with ancient artifacts and mysterious symbols. He spent hours studying the symbols, trying to decipher their meaning and unlock the secrets they held.
As he delved deeper into the chamber, he began to uncover a hidden history of the city, one that was shrouded in mystery and secrecy. He discovered that the city was built on an ancient site, one that was said to hold the power of the gods.
The man's journey took him to the city's ancient temple, where he discovered a hidden chamber filled with ancient artifacts and mysterious
2024-05-27 23:12:00,670 - WARNING - The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
2024-05-27 23:12:00,670 - WARNING - Setting pad_token_id to eos_token_id:128001 for open-end generation.

Fred · Answer 1 · Tue May 28 2024 06:55:19 GMT+0800 (China Standard Time)

for llama2-7b-chat-hf, error is as below
Traceback (most recent call last):
File "/home/intel/LLM/ipex-llm/python/llm/dev/benchmark/all-in-one/run.py", line 1835, in
File "/home/intel/LLM/ipex-llm/python/llm/dev/benchmark/all-in-one/run.py", line 75, in run_model
File "/home/intel/LLM/ipex-llm/python/llm/dev/benchmark/all-in-one/run.py", line 484, in run_transformer_int4_gpu
OSError: [Errno 24] Too many open files: '/home/intel/LLM/ipex-llm/python/llm/dev/benchmark/all-in-one/transformer_int4_gpu-results-2024-05-27.csv'

Xin Qiu · Answer 2 · Tue May 28 2024 10:13:08 GMT+0800 (China Standard Time)

Too many open files is a linux common configuration, if your ulimit -n is 1024, you can change your number of openfiles to 65536.
See our FAQ:
https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/FAQ/faq.html#too-many-open-files

Cengguang Zhang · Answer 3 · Fri May 31 2024 10:25:51 GMT+0800 (China Standard Time)

Llama3 1k-512 batch=4 hangs on fred env after two trial in all-in-one benchmark. Memory is not fully used 13G/16G.

Some test case results:
Llama3 1k-512 batch=3 run normally.
Llama3 32-32 batch=4 run normally.
Llama2-7b 1k-512 batch=4 run normally.
Qwen1.5-7b 1k-512 batch=4 get similar problem as llama3
Print log when running Llama3 1k 512 batch=4, find the inference is not hanging but running slowly in third trial.
Test on DDR4 server arc01, Llama3 1k 512 batch=4 can run normally.