intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add support for StableLM-2-12B on GPUs

aahouzi opened this issue · comments

Type of Change

Stability.ai open sourced StableLM-2-12B, which has a different architecture than its 1.6B & 3B counterparts. This issue is to ask for adding support for these models: stabilityai/stablelm-2-12b & stabilityai/stablelm-2-12b-chat to IPEX-LLM.

Description

  • Uses GQA instead of MHA + Parallel MLP layer + per-head qk_normalization
  • Model description: StableLM-2-12B