intel / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, DeepSeek, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, DeepSpeed, Axolotl, etc.

Repository from Github https://github.comintel/ipex-llmRepository from Github https://github.comintel/ipex-llm

Support for gemma3 from google

bigtan opened this issue · comments

请更新ollama,已支持gemma3

Error: llama runner process has terminated: this model is not supported by your version of Ollama. You may need to upgrade

Need to upgrade to Ollama v0.6, this should add support for gemma3

@puffer-duck But v0.6 not support intel card accelerate, am i right?

"error": {
"message": "llama runner process has terminated: this model is not supported by your version of Ollama. You may need to upgrade"

Yes we need this. Can Intel publish patches to Ollama so that we can compile ourselves? Or setup automatic nightly builds that follow the latest version of Ollama.
Just got my A770 but can still return it, it seems LLM/Ollama is not straight forward with Intel still :(

commented

Note this was brought up here: #12950

That is generally about the version disparity versus just gemma3 ( but now mentioning that model as another reason )

Please!

Hi All,

Gemma3 is now supported in ipex-llm llamacpp! (Ollama support is in progress—we'll provide updates once it's ready.)

Important Notes:

The 27B Gemma3 q4_k_m model requires >16GB VMem.

  • In text mode, you can use -c 128 to optimize memory usage.
  • In vision mode (llama-gemma3-cli), you may need two Arc GPUs or a GPU with larger memory.

Get Started:

Please follow the following steps to try it out:

1. Download the latest ipex-llm llamacpp portable zip:

https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md

2. Get mmproj.gguf & gemma3 gguf model files

Please download the pre-quantized version from HF: https://huggingface.co/collections/ggml-org/gemma-3-67d126315ac810df1ad9e913

(You must download both the text model and the mmproj file)

Note: Vision capability is available on these model sizes: 4b, 12b and 27b

3. Run gemma3

3.1 Linux

ngl=99
thread=8

3.1.1 Text only

./llama-cli -m $model_path --no-context-shift -n 128 --prompt "It is done, and submitted. You can play 'Survival of the Tastiest' on Android, and on the web. Playing on the web works, but you have to simulate multiple touch for table moving and that can be a bit confusing. There is a lot I'd like to talk about. I will go through every topic, insted of making the typical what went right/wrong list. Concept Working over the theme was probably one of the hardest tasks which I had to face. Originally, I had an idea of what kind of game I wanted to develop, gameplay wise - something with a lot of enemies/actors, simple graphics, maybe set in space, controlled from a top-down view. I was confident that I could fit any theme around it. In the end, the problem with a theme like 'Evolution' in a game is that evolution is unassisted. It happens through several seemingly random mutations over time, with the most apt permutation surviving. This genetic car simulator is, in my opinion, a great example of actual evolution of a species facing a challenge. But is it a game? In a game, you need to control something to reach an objective. That control goes against what evolution is supposed to be like. If you allow the user to pick how to evolve something, it's not evolution anymore - it's the equivalent of intelligent design, the fable invented by creationists to combat the idea of evolution. Being agnostic and a Pastafarian, that's not something that rubbed me the right way. Hence, my biggest dillema when deciding what to create was not with what I wanted to create, but with what I did not. I didn't want to create an 'intelligent design' simulator and wrongly call it evolution. This is a problem, of course, every other contestant also had to face. And judging by the entries submitted, not many managed to work around it. I'd say the only real solution was through the use of artificial selection, somehow. So far, I haven't seen any entry using this at its core gameplay. Alas, this is just a fun competition and after a while I decided not to be as strict with the game idea, and allowed myself to pick whatever I thought would work out. My initial idea was to create something where humanity tried to evolve to a next level, but had some kind of foe trying to stop them from doing so. I kind of had this image of human souls flying in space towards a monolith or a space baby (all based in 2001: A Space Odyssey of course) but I couldn't think of compelling (read: serious) mechanics for that. Borgs were my next inspiration, as their whole hypothesis fit pretty well into the evolution theme. But how to make it work? Are you the borg, or fighting the Borg? The third and final idea came to me through my girlfriend, who somehow gave me the idea of making something about the evolution of Pasta. The more I thought about it the more it sounded like it would work, so I decided to go with it. Conversations with my inspiring co-worker Roushey (who also created the 'Mechanical Underdogs' signature logo for my intros) further matured the concept, as it involved into the idea of having individual pieces of pasta flying around and trying to evolve until they became all-powerful. A secondary idea here was that the game would work to explain how the Flying Spaghetti Monster came to exist - by evolving from a normal dinner table. So the idea evolved more or less into this: you are sitting a table. You have your own plate, with is your 'base'. There are 5 other guests at the table, each with their own plate. Your plate can spawn little pieces of pasta. You do so by 'ordering' them through a menu. Some pastas are better than others; some are faster, some are stronger. They have varying 'costs', which are debited from your credits (you start with a number of credits). Once spawned, your pastas start flying around. Their instinct is to fly to other plates, in order to conquer them (the objective of the game is having your pasta conquer all the plates on the table). But they are really autonomous, so after being spawned, you have no control over your pasta (think DotA or LoL creeps). Your pasta doesn't like other people's pasta, so if they meet, they shoot sauce at each other until one dies. You get credits for other pastas your own pasta kill. Once a pasta is in vicinity of a plate" -t $thread -e -ngl $ngl --color -c 2048 --temp 0

3.1.2 Single turn (Vision)

./llama-gemma3-cli -m $model_path --mmproj $mmproj_path -ngl $ngl -t $thread -p "What is in this image?" --image $image_path

3.2.2 Chat mode (Vision)

./llama-gemma3-cli -m $model_path --mmproj $mmproj_path -ngl $ngl -t $thread

3.2 WIN

3.2.1 Text only

llama-cli.exe -m %MODEL_PATH% -n 128 --prompt "It is done, and submitted. You can play 'Survival of the Tastiest' on Android, and on the web. Playing on the web works, but you have to simulate multiple touch for table moving and that can be a bit confusing. There is a lot I'd like to talk about. I will go through every topic, insted of making the typical what went right/wrong list. Concept Working over the theme was probably one of the hardest tasks which I had to face. Originally, I had an idea of what kind of game I wanted to develop, gameplay wise - something with a lot of enemies/actors, simple graphics, maybe set in space, controlled from a top-down view. I was confident that I could fit any theme around it. In the end, the problem with a theme like 'Evolution' in a game is that evolution is unassisted. It happens through several seemingly random mutations over time, with the most apt permutation surviving. This genetic car simulator is, in my opinion, a great example of actual evolution of a species facing a challenge. But is it a game? In a game, you need to control something to reach an objective. That control goes against what evolution is supposed to be like. If you allow the user to pick how to evolve something, it's not evolution anymore - it's the equivalent of intelligent design, the fable invented by creationists to combat the idea of evolution. Being agnostic and a Pastafarian, that's not something that rubbed me the right way. Hence, my biggest dillema when deciding what to create was not with what I wanted to create, but with what I did not. I didn't want to create an 'intelligent design' simulator and wrongly call it evolution. This is a problem, of course, every other contestant also had to face. And judging by the entries submitted, not many managed to work around it. I'd say the only real solution was through the use of artificial selection, somehow. So far, I haven't seen any entry using this at its core gameplay. Alas, this is just a fun competition and after a while I decided not to be as strict with the game idea, and allowed myself to pick whatever I thought would work out. My initial idea was to create something where humanity tried to evolve to a next level, but had some kind of foe trying to stop them from doing so. I kind of had this image of human souls flying in space towards a monolith or a space baby (all based in 2001: A Space Odyssey of course) but I couldn't think of compelling (read: serious) mechanics for that. Borgs were my next inspiration, as their whole hypothesis fit pretty well into the evolution theme. But how to make it work? Are you the borg, or fighting the Borg? The third and final idea came to me through my girlfriend, who somehow gave me the idea of making something about the evolution of Pasta. The more I thought about it the more it sounded like it would work, so I decided to go with it. Conversations with my inspiring co-worker Roushey (who also created the 'Mechanical Underdogs' signature logo for my intros) further matured the concept, as it involved into the idea of having individual pieces of pasta flying around and trying to evolve until they became all-powerful. A secondary idea here was that the game would work to explain how the Flying Spaghetti Monster came to exist - by evolving from a normal dinner table. So the idea evolved more or less into this: you are sitting a table. You have your own plate, with is your 'base'. There are 5 other guests at the table, each with their own plate. Your plate can spawn little pieces of pasta. You do so by 'ordering' them through a menu. Some pastas are better than others; some are faster, some are stronger. They have varying 'costs', which are debited from your credits (you start with a number of credits). Once spawned, your pastas start flying around. Their instinct is to fly to other plates, in order to conquer them (the objective of the game is having your pasta conquer all the plates on the table). But they are really autonomous, so after being spawned, you have no control over your pasta (think DotA or LoL creeps). Your pasta doesn't like other people's pasta, so if they meet, they shoot sauce at each other until one dies. You get credits for other pastas your own pasta kill. Once a pasta is in vicinity of a plate" -t 8 -e -ngl 99 --color --ctx-size 1200 --no-mmap

3.2.2 Single turn (Vision)

llama-gemma3-cli.exe -m %MODEL_PATH% --mmproj %MMPROJ_PATH% -ngl 99 -t 8 -p "What is in this image?" --image %IMAGE_PATH%

3.2.3 Chat mode (Vision)

llama-gemma3-cli.exe -m %MODEL_PATH% --mmproj %MMPROJ_PATH% -ngl 99 -t 8 

Hi All,

Gemma3 is now supported in ipex-llm llamacpp! (Ollama support is in progress—we'll provide updates once it's ready.)

Important Notes:

The 27B Gemma3 q4_k_m model requires >16GB VMem.

  • In text mode, you can use -c 128 to optimize memory usage.
  • In vision mode (llama-gemma3-cli), you may need two Arc GPUs or a GPU with larger memory.

Get Started:

Please follow the following steps to try it out:

1. Download the latest ipex-llm llamacpp portable zip:

https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/llamacpp_portable_zip_gpu_quickstart.md

2. Get mmproj.gguf & gemma3 gguf model files

Please download the pre-quantized version from HF: https://huggingface.co/collections/ggml-org/gemma-3-67d126315ac810df1ad9e913

(You must download both the text model and the mmproj file)

Note: Vision capability is available on these model sizes: 4b, 12b and 27b

3. Run gemma3

3.1 Linux

ngl=99 thread=8

3.1.1 Text only

./llama-cli -m $model_path --no-context-shift -n 128 --prompt "It is done, and submitted. You can play 'Survival of the Tastiest' on Android, and on the web. Playing on the web works, but you have to simulate multiple touch for table moving and that can be a bit confusing. There is a lot I'd like to talk about. I will go through every topic, insted of making the typical what went right/wrong list. Concept Working over the theme was probably one of the hardest tasks which I had to face. Originally, I had an idea of what kind of game I wanted to develop, gameplay wise - something with a lot of enemies/actors, simple graphics, maybe set in space, controlled from a top-down view. I was confident that I could fit any theme around it. In the end, the problem with a theme like 'Evolution' in a game is that evolution is unassisted. It happens through several seemingly random mutations over time, with the most apt permutation surviving. This genetic car simulator is, in my opinion, a great example of actual evolution of a species facing a challenge. But is it a game? In a game, you need to control something to reach an objective. That control goes against what evolution is supposed to be like. If you allow the user to pick how to evolve something, it's not evolution anymore - it's the equivalent of intelligent design, the fable invented by creationists to combat the idea of evolution. Being agnostic and a Pastafarian, that's not something that rubbed me the right way. Hence, my biggest dillema when deciding what to create was not with what I wanted to create, but with what I did not. I didn't want to create an 'intelligent design' simulator and wrongly call it evolution. This is a problem, of course, every other contestant also had to face. And judging by the entries submitted, not many managed to work around it. I'd say the only real solution was through the use of artificial selection, somehow. So far, I haven't seen any entry using this at its core gameplay. Alas, this is just a fun competition and after a while I decided not to be as strict with the game idea, and allowed myself to pick whatever I thought would work out. My initial idea was to create something where humanity tried to evolve to a next level, but had some kind of foe trying to stop them from doing so. I kind of had this image of human souls flying in space towards a monolith or a space baby (all based in 2001: A Space Odyssey of course) but I couldn't think of compelling (read: serious) mechanics for that. Borgs were my next inspiration, as their whole hypothesis fit pretty well into the evolution theme. But how to make it work? Are you the borg, or fighting the Borg? The third and final idea came to me through my girlfriend, who somehow gave me the idea of making something about the evolution of Pasta. The more I thought about it the more it sounded like it would work, so I decided to go with it. Conversations with my inspiring co-worker Roushey (who also created the 'Mechanical Underdogs' signature logo for my intros) further matured the concept, as it involved into the idea of having individual pieces of pasta flying around and trying to evolve until they became all-powerful. A secondary idea here was that the game would work to explain how the Flying Spaghetti Monster came to exist - by evolving from a normal dinner table. So the idea evolved more or less into this: you are sitting a table. You have your own plate, with is your 'base'. There are 5 other guests at the table, each with their own plate. Your plate can spawn little pieces of pasta. You do so by 'ordering' them through a menu. Some pastas are better than others; some are faster, some are stronger. They have varying 'costs', which are debited from your credits (you start with a number of credits). Once spawned, your pastas start flying around. Their instinct is to fly to other plates, in order to conquer them (the objective of the game is having your pasta conquer all the plates on the table). But they are really autonomous, so after being spawned, you have no control over your pasta (think DotA or LoL creeps). Your pasta doesn't like other people's pasta, so if they meet, they shoot sauce at each other until one dies. You get credits for other pastas your own pasta kill. Once a pasta is in vicinity of a plate" -t $thread -e -ngl $ngl --color -c 2048 --temp 0

3.1.2 Single turn (Vision)

./llama-gemma3-cli -m $model_path --mmproj $mmproj_path -ngl $ngl -t $thread -p "What is in this image?" --image $image_path

3.2.2 Chat mode (Vision)

./llama-gemma3-cli -m $model_path --mmproj $mmproj_path -ngl $ngl -t $thread

3.2 WIN

3.2.1 Text only

llama-cli.exe -m %MODEL_PATH% -n 128 --prompt "It is done, and submitted. You can play 'Survival of the Tastiest' on Android, and on the web. Playing on the web works, but you have to simulate multiple touch for table moving and that can be a bit confusing. There is a lot I'd like to talk about. I will go through every topic, insted of making the typical what went right/wrong list. Concept Working over the theme was probably one of the hardest tasks which I had to face. Originally, I had an idea of what kind of game I wanted to develop, gameplay wise - something with a lot of enemies/actors, simple graphics, maybe set in space, controlled from a top-down view. I was confident that I could fit any theme around it. In the end, the problem with a theme like 'Evolution' in a game is that evolution is unassisted. It happens through several seemingly random mutations over time, with the most apt permutation surviving. This genetic car simulator is, in my opinion, a great example of actual evolution of a species facing a challenge. But is it a game? In a game, you need to control something to reach an objective. That control goes against what evolution is supposed to be like. If you allow the user to pick how to evolve something, it's not evolution anymore - it's the equivalent of intelligent design, the fable invented by creationists to combat the idea of evolution. Being agnostic and a Pastafarian, that's not something that rubbed me the right way. Hence, my biggest dillema when deciding what to create was not with what I wanted to create, but with what I did not. I didn't want to create an 'intelligent design' simulator and wrongly call it evolution. This is a problem, of course, every other contestant also had to face. And judging by the entries submitted, not many managed to work around it. I'd say the only real solution was through the use of artificial selection, somehow. So far, I haven't seen any entry using this at its core gameplay. Alas, this is just a fun competition and after a while I decided not to be as strict with the game idea, and allowed myself to pick whatever I thought would work out. My initial idea was to create something where humanity tried to evolve to a next level, but had some kind of foe trying to stop them from doing so. I kind of had this image of human souls flying in space towards a monolith or a space baby (all based in 2001: A Space Odyssey of course) but I couldn't think of compelling (read: serious) mechanics for that. Borgs were my next inspiration, as their whole hypothesis fit pretty well into the evolution theme. But how to make it work? Are you the borg, or fighting the Borg? The third and final idea came to me through my girlfriend, who somehow gave me the idea of making something about the evolution of Pasta. The more I thought about it the more it sounded like it would work, so I decided to go with it. Conversations with my inspiring co-worker Roushey (who also created the 'Mechanical Underdogs' signature logo for my intros) further matured the concept, as it involved into the idea of having individual pieces of pasta flying around and trying to evolve until they became all-powerful. A secondary idea here was that the game would work to explain how the Flying Spaghetti Monster came to exist - by evolving from a normal dinner table. So the idea evolved more or less into this: you are sitting a table. You have your own plate, with is your 'base'. There are 5 other guests at the table, each with their own plate. Your plate can spawn little pieces of pasta. You do so by 'ordering' them through a menu. Some pastas are better than others; some are faster, some are stronger. They have varying 'costs', which are debited from your credits (you start with a number of credits). Once spawned, your pastas start flying around. Their instinct is to fly to other plates, in order to conquer them (the objective of the game is having your pasta conquer all the plates on the table). But they are really autonomous, so after being spawned, you have no control over your pasta (think DotA or LoL creeps). Your pasta doesn't like other people's pasta, so if they meet, they shoot sauce at each other until one dies. You get credits for other pastas your own pasta kill. Once a pasta is in vicinity of a plate" -t 8 -e -ngl 99 --color --ctx-size 1200 --no-mmap

3.2.2 Single turn (Vision)

llama-gemma3-cli.exe -m %MODEL_PATH% --mmproj %MMPROJ_PATH% -ngl 99 -t 8 -p "What is in this image?" --image %IMAGE_PATH%

3.2.3 Chat mode (Vision)

llama-gemma3-cli.exe -m %MODEL_PATH% --mmproj %MMPROJ_PATH% -ngl 99 -t 8 

Thank you for your efforts. I see that the portable zip of ollama in the pre-release has been updated to 20250313, but it seems that the ollama portable zip still cannot run gemma3 properly.

will it be possible to run on server or python? many thanks

will it be possible to run on server or python? many thanks

We will release the Ollama portable zip with gemma3 support soon.

Hi All, you may install our latest version of ipex-llm ollama via pip install --pre --upgrae ipex-llm[cpp] to run gemma3 as below:

  1. Run Ollama with GGUF Model on ModelScope

    ollama run modelscope.cn/lmstudio-community/gemma-3-4b-it-GGUF:Q4_K_M
    
  2. Run Ollama with GGUF Model on HuggingFace Hub

  • You may pull a GGUF model from HuggingFace Hub and then create an Ollama model with Modelfile.
    # In the Modelfile
    FROM /path/to/gemma-3-4b-it-Q4_K_M.gguf
    
    # Create an Ollama model
    ollama create gemma3-gguf -f Modelfile
    

You may see ipex-llm ollama quickstart for more details.

@sgwhat No Ollama Portable Zip ?

I've tried the official gemma3 models in 4b and 12b, as well as the q4_K_M versions from ollama, and then also lmstudio-community/gemma-3-4b-it-GGUF:Q4_K_M from huggingface and none of them seem to work after setting up a new conda environment from the quickstart link and running pip install --pre --upgrade ipex-llm[cpp]. I've attached the logs from running ./ollama run https://huggingface.co/lmstudio-community/gemma-3-4b-it-GGUF:Q4_K_M. thank you!

ollama-gemma.txt

emm ... Any new progress?

Image

@sgwhat No Ollama Portable Zip ?

@ExplodingDragon @yizhangliu Releasing. You may see #12963 (comment) to run it first.

@ExplodingDragon @yizhangliu Releasing. You may see #12963 (comment) to run it first.

@sgwhat It looks good. Are there any plans to submit the Ollama patch to the upstream?

OneAPI already offers out-of-the-box support on certain systems like ArchLinux. Could you consider providing a statically linked Ollama or similar package?

Thanks. But it's not easy to do "pip install --pre --upgrae ipex-llm[cpp]".

Hi All,

The Ollama portable zip is now available! Please follow the instructions link to download.

Note 1: For now, you need to either use ModelScope as the model source (see details here: link), or run a local GGUF model downloaded from HuggingFace (see details here: link) for Gemma3

Note 2: The text input support for Gemma3 is ready, while the image input support is still WIP for Ollama

After deployment, I asked a few questions about pictures, but the answers were incorrect.
I used LM-Studio for deployment and there was no problem with answering picture questions.

After deployment, I asked a few questions about pictures, but the answers were incorrect. I used LM-Studio for deployment and there was no problem with answering picture questions.

Hi @cunkai, currently ipex-llm ollama Gemma3 does not have good support for the image part; we have only fully supported the text part. We will add full support in a future ipex-llm ollama 0.6.x release.

I've tried the official gemma3 models in 4b and 12b, as well as the q4_K_M versions from ollama, and then also lmstudio-community/gemma-3-4b-it-GGUF:Q4_K_M from huggingface and none of them seem to work after setting up a new conda environment from the quickstart link and running pip install --pre --upgrade ipex-llm[cpp]. I've attached the logs from running ./ollama run https://huggingface.co/lmstudio-community/gemma-3-4b-it-GGUF:Q4_K_M. thank you!

ollama-gemma.txt

For now, you may also run a local GGUF model downloaded from HF; see #12963 (comment)

Edit: It works with a model file from HF.

Unfortunately I can't get it to work:
tom@computer: /usr/share/ollama $ export IPEX_LLM_MODEL_SOURCE=modelscope
tom@computer: /usr/share/ollama $ ollama run gemma3:12b
tom@computer: /usr/share/ollama $ ollama run modelscope.cn/lmstudio-community/gemma-3-12b-it-GGUF:Q4_K_M --verbose
ggml_sycl_init: found 1 SYCL devices:
Error: llama runner process has terminated: exit status 2
tom@computer: /usr/share/ollama $ ollama --version
ggml_sycl_init: found 1 SYCL devices:
ollama version is 0.5.4-ipexllm-20250318
tom@computer: /usr/share/ollama $ journalctl -u ollama.service

Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: format           = GGUF V3 (latest)
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: arch             = gemma3
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: vocab type       = SPM
[...]
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: model type       = 12B
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: model ftype      = Q4_K - Medium
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: model params     = 11.77 B
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: model size       = 6.79 GiB (4.96 BPW)
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: general.name     = Gemma 3 12b It
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: BOS token        = 2 '<bos>'
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: EOS token        = 1 '<eos>'
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: EOT token        = 106 '<end_of_turn>'
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: UNK token        = 3 '<unk>'
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: PAD token        = 0 '<pad>'
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: LF token         = 248 '<0x0A>'
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: EOG token        = 1 '<eos>'
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: EOG token        = 106 '<end_of_turn>'
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_print_meta: max token length = 48
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_tensors: offloading 48 repeating layers to GPU
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_tensors: offloading output layer to GPU
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_tensors: offloaded 49/49 layers to GPU
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_tensors:        SYCL0 model buffer size =  6956.18 MiB
Mar 19 10:09:41 proxmox1 ollama[2002914]: llm_load_tensors:          CPU model buffer size =   787.50 MiB
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: n_seq_max     = 1
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: n_ctx         = 16384
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: n_ctx_per_seq = 16384
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: n_batch       = 512
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: n_ubatch      = 512
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: flash_attn    = 0
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: freq_base     = 1000000.0
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: freq_scale    = 0.125
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: n_ctx_per_seq (16384) < n_ctx_train (131072) -- the full capacity of the model will >
Mar 19 10:09:53 proxmox1 ollama[2002914]: [SYCL] call ggml_check_sycl
Mar 19 10:09:53 proxmox1 ollama[2002914]: ggml_check_sycl: GGML_SYCL_DEBUG: 0
Mar 19 10:09:53 proxmox1 ollama[2002914]: ggml_check_sycl: GGML_SYCL_F16: no
Mar 19 10:09:53 proxmox1 ollama[2002914]: Found 1 SYCL devices:
Mar 19 10:09:53 proxmox1 ollama[2002914]: |  |                   |                                       |       |Max    |        |Max  |Global |           >
Mar 19 10:09:53 proxmox1 ollama[2002914]: |  |                   |                                       |       |compute|Max work|sub  |mem    |           >
Mar 19 10:09:53 proxmox1 ollama[2002914]: |ID|        Device Type|                                   Name|Version|units  |group   |group|size   |       Driv>
Mar 19 10:09:53 proxmox1 ollama[2002914]: |--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|----------->
Mar 19 10:09:53 proxmox1 ollama[2002914]: | 0| [level_zero:gpu:0]|                Intel Arc A770 Graphics|  12.55|    512|    1024|   32| 16225M|         1.>
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_kv_cache_init:      SYCL0 KV buffer size =  6144.00 MiB
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: KV self size  = 6144.00 MiB, K (f16): 3072.00 MiB, V (f16): 3072.00 MiB
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model:  SYCL_Host  output buffer size =     1.01 MiB
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model:      SYCL0 compute buffer size =   671.00 MiB
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model:  SYCL_Host compute buffer size =    71.51 MiB
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: graph nodes  = 1975
Mar 19 10:09:53 proxmox1 ollama[2002914]: llama_new_context_with_model: graph splits = 2
Mar 19 10:09:53 proxmox1 ollama[2002914]: key general.file_type not found in file
Mar 19 10:09:53 proxmox1 ollama[2002914]: terminate called after throwing an instance of 'std::runtime_error'
Mar 19 10:09:53 proxmox1 ollama[2002914]:   what():  Missing required key: general.file_type
Mar 19 10:09:53 proxmox1 ollama[2002914]: SIGABRT: abort
Mar 19 10:09:53 proxmox1 ollama[2002914]: PC=0x7e1fb84a9eec m=9 sigcode=18446744073709551610
Mar 19 10:09:53 proxmox1 ollama[2002914]: signal arrived during cgo execution

I'll try to download from HF instead.

Tried ollama-ipex-llm-2.2.0b20250318-ubuntu.tgz, I got:

❯ ./ollama run gemma3:4b
Error: llama runner process has terminated: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
llama_load_model_from_file: failed to load model

Tried ollama-ipex-llm-2.2.0b20250318-ubuntu.tgz, I got:

❯ ./ollama run gemma3:4b
Error: llama runner process has terminated: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
llama_load_model_from_file: failed to load model

See #12963 (comment)

while the image input support is still WIP

Yesterday I tried to run it directly through lama-сpp - and it worked (on 2-3x A770).

On the model gemma3 27B Q8 (modelscope.cn/lmstudio-community/gemma-3-27b-it-GGUF:Q8_0)

this command:

c:\llm\llama-cpp>llama-gemma3-cli -m C:\Users\uuk\.ollama\models\blobs\sha256-e1ef8587b2bdcbf4c2f888f3f618626dcee42096d0e38b63b26cbef4a1a56da8 --mmproj C:\llm\models\mmproj-model-f16.gguf -ngl 999 -t 8 -p "What is in this image?" --image C:\llm\models\1.jpg

Image

while the image input support is still WIP

Yesterday I tried to run it directly through lama-сpp - and it worked (on 2-3x A770).

On the model gemma3 27V Q8 (modelscope.cn/lmstudio-community/gemma-3-27b-it-GGUF:Q8_0)

this command:

c:\llm\llama-cpp>llama-gemma3-cli -m C:\Users\uuk\.ollama\models\blobs\sha256-e1ef8587b2bdcbf4c2f888f3f618626dcee42096d0e38b63b26cbef4a1a56da8 --mmproj C:\llm\models\mmproj-model-f16.gguf -ngl 999 -t 8 -p "What is in this image?" --image C:\llm\models\1.jpg

Image

Yes, the support is llama.cpp is complete (see #12963 (comment)); the image support in Ollama is still in progress

It's OK.
But, the output results are somewhat verbose.

Using the GGUF version and the instructions from #12963 (comment) along with the portable version made it work. I'm getting some strange results from the GGUF version, but I'm seeing those strange results on my AMD-based machine too, so that seems unrelated. Thanks for the help y'all!

Tried ollama-ipex-llm-2.2.0b20250318-ubuntu.tgz, I got:

❯ ./ollama run gemma3:4b
Error: llama runner process has terminated: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
llama_load_model_from_file: failed to load model

See #12963 (comment)

@jason-dai no luck with that either

❯ ./ollama run modelscope.cn/lmstudio-community/gemma-3-4b-it-GGUF
Error: llama runner process has terminated: exit status 2

on the serve side:

key general.file_type not found in file
terminate called after throwing an instance of 'std::runtime_error'
  what():  Missing required key: general.file_type
SIGABRT: abort

Tried ollama-ipex-llm-2.2.0b20250318-ubuntu.tgz, I got:

❯ ./ollama run gemma3:4b
Error: llama runner process has terminated: error loading model: error loading model hyperparameters: key not found in model: gemma3.attention.layer_norm_rms_epsilon
llama_load_model_from_file: failed to load model

See #12963 (comment)

@jason-dai no luck with that either

❯ ./ollama run modelscope.cn/lmstudio-community/gemma-3-4b-it-GGUF
Error: llama runner process has terminated: exit status 2

on the serve side:

key general.file_type not found in file
terminate called after throwing an instance of 'std::runtime_error'
  what():  Missing required key: general.file_type
SIGABRT: abort

I got the same error, got it working by downloading models from HF instead.

Same i am facing the same issue just like everyone

(llm) C:\Users\Ghoul>ollama.exe run modelscope.cn/lmstudio-community/gemma-3-4b-it-GGUF:Q4_K_M ggml_sycl_init: found 1 SYCL devices: Error: llama runner process has terminated: exit status 2

Today I tried again (ollama-ipex-llm-2.2.0b20250318-ubuntu) and it worked!

export IPEX_LLM_MODEL_SOURCE=modelscope
./ollama run gemma3

Hi all, we are working on upgrading ipex-llm ollama version to re-support gemma3. Before that, you may manage to run gemma3:1b. For more detailes, please see https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md.