Unable to fully load model into Vram using ollama zip gpu

Question

Unable to fully load model into Vram using ollama zip gpu

dttprofessor opened this issue 8 months ago · comments

SYSTEM：U265K(igpu off)+48G ram+B580(12g)

deepseek-r1:14b (Q4)：
B580 video memory is enough to load deepseek-r1:14b (Q4) model, but segmentation error occurs, less than 7G is loaded into VRAM，and the rest is loaded into shared GPU memory。

deepseek-r1:32b (Q4)：
12G of the model is loaded into the dedicated GPU memory, and the remaining 8G is loaded into the shared GPU memory. The system RAM is basically not occupied and the CPU cannot participate in reasoning.

SONG Ge · Answer 1 · Fri Mar 21 2025 10:00:31 GMT+0800 (China Standard Time)

Could you check your GPU's VRAM usage before loading the model?

dttprofessor · Answer 2 · Fri Mar 21 2025 10:17:25 GMT+0800 (China Standard Time)

11GB（12GB total）is free. 发送自我的盖乐世

…

-------- 原始信息 -------- 发件人： SONG Ge ***@***.***> 日期: 2025/3/21 10:00 (GMT+08:00) 收件人： intel/ipex-llm ***@***.***> 抄送： dttprofessor ***@***.***>, Author ***@***.***> 主题： Re: [intel/ipex-llm] Unable to fully load model into Vram using ollama zip gpu (Issue #12982) Could you check your GPU's VRAM usage before loading the model? ― Reply to this email directly, view it on GitHub<#12982 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BGO5IR5JGXHBZHJUQRAIFND2VNXFJAVCNFSM6AAAAABZLB63H2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONBSGA2TGMZWGM>. You are receiving this because you authored the thread.Message ID: ***@***.***> [sgwhat]sgwhat left a comment (intel/ipex-llm#12982)<#12982 (comment)> Could you check your GPU's VRAM usage before loading the model? ― Reply to this email directly, view it on GitHub<#12982 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BGO5IR5JGXHBZHJUQRAIFND2VNXFJAVCNFSM6AAAAABZLB63H2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONBSGA2TGMZWGM>. You are receiving this because you authored the thread.Message ID: ***@***.***>

suomi2024 · Answer 3 · Fri Mar 21 2025 18:45:28 GMT+0800 (China Standard Time)

set OLLAMA_NUM_GPU=999

suomi2024 · Answer 4 · Fri Mar 21 2025 18:45:37 GMT+0800 (China Standard Time)

set OLLAMA_NUM_GPU=999

set no_proxy=localhost,127.0.0.1

set ZES_ENABLE_SYSMAN=1

set SYCL_CACHE_PERSISTENT=1

set SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1

set OLLAMA_KEEP_ALIVE=-1

set OLLAMA_NUM_PARALLEL=1

set OLLAMA_PARAMETER num_ctx 16384

set OLLAMA_PARAMETER num_predict 8192

set PARAMETER num_ctx 16384

set PARAMETER num_predict 8192

dttprofessor · Answer 5 · Fri Mar 21 2025 19:57:29 GMT+0800 (China Standard Time)

I will try it 发送自我的盖乐世

…

-------- 原始信息 -------- 发件人： suomi2024 ***@***.***> 日期: 2025/3/21 18:46 (GMT+08:00) 收件人： intel/ipex-llm ***@***.***> 抄送： dttprofessor ***@***.***>, Author ***@***.***> 主题： Re: [intel/ipex-llm] Unable to fully load model into Vram using ollama zip gpu (Issue #12982) set OLLAMA_NUM_GPU=999 set no_proxy=localhost,127.0.0.1 set ZES_ENABLE_SYSMAN=1 set SYCL_CACHE_PERSISTENT=1 set SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 set OLLAMA_KEEP_ALIVE=-1 set OLLAMA_NUM_PARALLEL=1 set OLLAMA_PARAMETER num_ctx 16384 set OLLAMA_PARAMETER num_predict 8192 set PARAMETER num_ctx 16384 set PARAMETER num_predict 8192 ― Reply to this email directly, view it on GitHub<#12982 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BGO5IRZH6ZLBLKT7UHQORUT2VPUWNAVCNFSM6AAAAABZLB63H2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONBSHE4TGMZZGY>. You are receiving this because you authored the thread.Message ID: ***@***.***> [suomi2024]suomi2024 left a comment (intel/ipex-llm#12982)<#12982 (comment)> set OLLAMA_NUM_GPU=999 set no_proxy=localhost,127.0.0.1 set ZES_ENABLE_SYSMAN=1 set SYCL_CACHE_PERSISTENT=1 set SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 set OLLAMA_KEEP_ALIVE=-1 set OLLAMA_NUM_PARALLEL=1 set OLLAMA_PARAMETER num_ctx 16384 set OLLAMA_PARAMETER num_predict 8192 set PARAMETER num_ctx 16384 set PARAMETER num_predict 8192 ― Reply to this email directly, view it on GitHub<#12982 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BGO5IRZH6ZLBLKT7UHQORUT2VPUWNAVCNFSM6AAAAABZLB63H2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONBSHE4TGMZZGY>. You are receiving this because you authored the thread.Message ID: ***@***.***>