Met segment fault while running Whisper on Arc
Ruoyu-y opened this issue · comments
Configuration:
OS: Ubuntu 24.04
CPU: 12th Gen Intel(R) Core(TM) i9-12900K
Memory: 16G
GPU: 04:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08)
software:
torch 2.1.0a0+cxx11.abi
intel-extension-for-pytorch 2.1.10+xpu
ipex-llm 2.2.0b20250322
bigdl-core-xe-21 2.6.0b20250322
Issue met:
run whisper with command python ./recognize.py and get segment fault error
Logs:
$ python recognize.py
/home/cloud/ruoyu/miniforge3/envs/llm/lib/python3.11/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
/home/cloud/ruoyu/miniforge3/envs/llm/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
2025-03-25 09:18:43,572 - INFO - intel_extension_for_pytorch auto imported
2025-03-25 09:18:43,855 - INFO - PyTorch version 2.1.0a0+cxx11.abi available.
step1:
/home/cloud/ruoyu/miniforge3/envs/llm/lib/python3.11/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
2025-03-25 09:18:46,419 - INFO - Converting the current model to sym_int4 format......
LIBXSMM_VERSION: main_stable-1.17-3651 (25693763)
LIBXSMM_TARGET: adl [12th Gen Intel(R) Core(TM) i9-12900K]
Registry and code: 13 MB
Command: python recognize.py
Uptime: 3.432546 s
Segmentation fault
Any hint for this issue? Or recommended configuration?
Hi,
May I ask if this segment fault only exists for whisper or it also exists in running other models https://github.com/intel/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/LLM ?
Also, you may use our script to check the environment so that we can better help to detect the issue: https://github.com/intel/ipex-llm/tree/main/python/llm/scripts#usage
Hi,
May I ask if this segment fault only exists for whisper or it also exists in running other models https://github.com/intel/ipex-llm/tree/main/python/llm/example/GPU/HuggingFace/LLM ?
Also, you may use our script to check the environment so that we can better help to detect the issue: https://github.com/intel/ipex-llm/tree/main/python/llm/scripts#usage
I found other LLMs also returns segment fault error. But it works with docker container. Here's the output of that environment check script:
$ bash env-check.sh
-----------------------------------------------------------------
PYTHON_VERSION=3.11.11
-----------------------------------------------------------------
transformers=4.36.2
-----------------------------------------------------------------
torch=2.1.0a0+cxx11.abi
-----------------------------------------------------------------
ipex-llm Version: 2.2.0b20250322
-----------------------------------------------------------------
ipex=2.1.10+xpu
-----------------------------------------------------------------
CPU Information:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 24
On-line CPU(s) list: 0-23
Vendor ID: GenuineIntel
Model name: 12th Gen Intel(R) Core(TM) i9-12900K
CPU family: 6
Model: 151
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
Stepping: 2
CPU(s) scaling MHz: 22%
CPU max MHz: 5200.0000
CPU min MHz: 800.0000
-----------------------------------------------------------------
Total CPU Memory: 15.3286 GB
Memory Type: DDR5
-----------------------------------------------------------------
Operating System:
Ubuntu 24.04 LTS \n \l
-----------------------------------------------------------------
Linux cloudgpu 6.8.0-52-generic #53-Ubuntu SMP PREEMPT_DYNAMIC Sat Jan 11 00:06:25 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
-----------------------------------------------------------------
CLI:
Version: 1.2.39.20240906
Build ID: 11f3c29a
Service:
Version: 1.2.39.20240906
Build ID: 11f3c29a
Level Zero Version: 1.17.0
-----------------------------------------------------------------
Driver Version 2023.16.12.0.12_195853.xmain-hotfix
Driver Version 2023.16.12.0.12_195853.xmain-hotfix
-----------------------------------------------------------------
Driver related package version:
ii intel-fw-gpu 2024.17.5-329~22.04 all Firmware package for Intel integrated and discrete GPUs
ii intel-level-zero-gpu 1.3.29735.27-914~22.04 amd64 Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii intel-level-zero-gpu-raytracing 1.0.0-60~u22.04 amd64 Level Zero Ray Tracing Support library
-----------------------------------------------------------------
igpu not detected
-----------------------------------------------------------------
xpu-smi is properly installed.
-----------------------------------------------------------------
No device discovered
GPU0 Memory ize=256M
-----------------------------------------------------------------
04:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
Subsystem: Shenzhen Gunnir Technology Development Co., Ltd DG2 [Arc A770]
Flags: bus master, fast devsel, latency 0, IRQ 234, IOMMU group 20
Memory at 86000000 (64-bit, non-prefetchable) [size=16M]
Memory at 4050000000 (64-bit, prefetchable) [size=256M]
Expansion ROM at 87000000 [disabled] [size=2M]
Capabilities: <access denied>
Kernel driver in use: i915
Kernel modules: i915, xe
-----------------------------------------------------------------
Is there anything wrong in the configuration?
To provide more details, on the same machine, i could run the inference service in docker according to the guide https://github.com/intel/ipex-llm/blob/main/docs/mddocs/DockerGuides/vllm_docker_quickstart.md. But i cannot run the whisper or other LLMs under python/llm/example/GPU/HuggingFace/LLM folder on my host. I also tried to run the whisper python file inside the docker container bring up following the previous guide, it failed as well. Please help to take a look @hkvision, thanks a lot!
Hi, we checked your env, the following part might have issues.
-----------------------------------------------------------------
No device discovered
GPU0 Memory ize=256M
Could you use sycl-ls and xpu-smi discovery to confirm if the Arc device is properly detected? Thanks!
xpu-smi discovery
xpu-smi discovery returns No device discovered. But i could found the Arc card using lspci. As i am using the in-tree driver in the ubuntu 24.04, will that cause the issue? @hkvision
From your lspci result below, seems the memory 256M is not correct, should be 16G? Maybe can you check if your card is settled properly (e.g. resize bar)? Also is the result of sycl-ls as expected on your machine?
04:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller])
Subsystem: Shenzhen Gunnir Technology Development Co., Ltd DG2 [Arc A770]
Flags: bus master, fast devsel, latency 0, IRQ 234, IOMMU group 20
Memory at 86000000 (64-bit, non-prefetchable) [size=16M]
Memory at 4050000000 (64-bit, prefetchable) [size=256M]
Expansion ROM at 87000000 [disabled] [size=2M]
Capabilities: <access denied>
Kernel driver in use: i915
Kernel modules: i915, xe
From your lspci result below, seems the memory 256M is not correct, should be 16G? Maybe can you check if your card is settled properly (e.g. resize bar)? Also is the result of
sycl-lsas expected on your machine?04:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A770] (rev 08) (prog-if 00 [VGA controller]) Subsystem: Shenzhen Gunnir Technology Development Co., Ltd DG2 [Arc A770] Flags: bus master, fast devsel, latency 0, IRQ 234, IOMMU group 20 Memory at 86000000 (64-bit, non-prefetchable) [size=16M] Memory at 4050000000 (64-bit, prefetchable) [size=256M] Expansion ROM at 87000000 [disabled] [size=2M] Capabilities: <access denied> Kernel driver in use: i915 Kernel modules: i915, xe
I found no Arc shown in the sycl-ls result. How shall i fix the issue? I used to run ipex-llm inside docker and in that way, i could find Arc using sycl-ls
We suppose this is not an ipex-llm issue but probably due to driver related packages.
You may refer to https://dgpu-docs.intel.com/driver/client/overview.html#installing-client-gpus-on-ubuntu-desktop-24-04-lts for the driver guide.
The environment of the docker (Ubuntu 22.04) is here: https://github.com/intel/ipex-llm/blob/main/docker/llm/serving/xpu/docker/Dockerfile
We suppose this is not an ipex-llm issue but probably due to driver related packages. You may refer to https://dgpu-docs.intel.com/driver/client/overview.html#installing-client-gpus-on-ubuntu-desktop-24-04-lts for the driver guide. The environment of the docker (Ubuntu 22.04) is here: https://github.com/intel/ipex-llm/blob/main/docker/llm/serving/xpu/docker/Dockerfile
I follow the guide that you provided to install the driver again. Using the command 'clinfo | grep "770"' that provided at the end of the tutorial, i could see the device shown. Then i tried to install other dependencies according to the doc https://github.com/intel/ipex-llm/blob/main/docs/mddocs/Quickstart/install_linux_gpu.md#install-oneapi, everything seems fine. Then at last, i still met that segment fault. Any other suggestions?
Or is there any example to run whisper in a docker container?
Thanks for the guidance. Issue has been resolved
Synced offline, pip install trl==0.11.0 solves the problem.
Feel free to tell us if there are further issues later :)
Synced offline,
pip install trl==0.11.0solves the problem. Feel free to tell us if there are further issues later :)
Shall we update the example readme?
Synced offline,
pip install trl==0.11.0solves the problem. Feel free to tell us if there are further issues later :)Shall we update the example readme?
Sure :)

