intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

all-in-one based version 2.1.0b1 issue for llama3-8b-instruct with 128-2048

Fred-cell opened this issue · comments

commented

Last iterator is verfy slow for llama3-8b-instruct & fp8.

Cannot reproduce the issue in the env provided by user, will sync details offline.

After sync with user offline, cannot reproduce in users' env, will keep monitoring similar behavior, close issue for now.