Unable to fully load model into Vram using ollama zip gpu
dttprofessor opened this issue · comments
SYSTEM:U265K(igpu off)+48G ram+B580(12g)
deepseek-r1:14b (Q4):
B580 video memory is enough to load deepseek-r1:14b (Q4) model, but segmentation error occurs, less than 7G is loaded into VRAM,and the rest is loaded into shared GPU memory。
deepseek-r1:32b (Q4):
12G of the model is loaded into the dedicated GPU memory, and the remaining 8G is loaded into the shared GPU memory. The system RAM is basically not occupied and the CPU cannot participate in reasoning.
Could you check your GPU's VRAM usage before loading the model?
set OLLAMA_NUM_GPU=999
set OLLAMA_NUM_GPU=999
set no_proxy=localhost,127.0.0.1
set ZES_ENABLE_SYSMAN=1
set SYCL_CACHE_PERSISTENT=1
set SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
set OLLAMA_KEEP_ALIVE=-1
set OLLAMA_NUM_PARALLEL=1
set OLLAMA_PARAMETER num_ctx 16384
set OLLAMA_PARAMETER num_predict 8192
set PARAMETER num_ctx 16384
set PARAMETER num_predict 8192