support >= 4GB SYCL compute buffer size for longer context length

Question

support >= 4GB SYCL compute buffer size for longer context length

ytliew82 opened this issue 7 months ago · comments

Describe the bug
The SYCL Unified Shared Memory (USM) type of device memory has maximum constraint of 4 GB. Ipex-llm will report error if the calculated kv cache size is more than 4GB.

How to reproduce
computer setup with igpu only inference and >= 32GB ram, thus expecting no allocation issue with larger context size.
encounter this issue with Gemma-3 model

Steps to reproduce the error:

configure the -c argument to smaller count
observe the buffer size reported used for SYCL buffer, safe if less than 4GB
increase the -c argument till expectation is larger than 4GB. Will getting the reported error on memory allocation issue.

Additional context
Am running gemma 3 model with llama server, thus expecting similar issue for other moe models

declaring multiple SYCL USM device instances might overcome this constraint, to have more than 4GB buffer size for longer context length (few k and above, and case with parallel enabled)

toncao commented 7 months ago

+1

Yina Chen · Answer 1 · Tue Apr 01 2025 16:38:33 GMT+0800 (China Standard Time)

Hi ytliew82,

We previously encountered the same error with Gemma-3 4B on ARC, while Gemma-3 12B seemed to work fine. Are you using the 4B model in your test?

ytliew82 · Answer 2 · Wed Apr 02 2025 00:20:16 GMT+0800 (China Standard Time)

tested with Gemma-3 4B, 12B, having same error on not fit into device buffer.
currently run with cpu only inference as workaround, and limiting the -ngl argument to fit into 4GB device buffer.

anyway, based on my understanding, the USM type of host/device/shared mostly apply for dGPU.
https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2025-0/unified-shared-memory-allocations.html#USM-ALLOCATION

since the IGPU shared the L3 cache with CPU, could we try optionally use shared buffer instead of device buffer? if initialize --device IGPU

Yina Chen · Answer 3 · Wed Apr 02 2025 10:00:48 GMT+0800 (China Standard Time)

Hi ytliew82,

Thank you for the information! We'll provide updates once it's supported.