kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Home Page:https://kvcache-ai.github.io/ktransformers/

Repository from Github https://github.comkvcache-ai/ktransformersRepository from Github https://github.comkvcache-ai/ktransformers

[Bug] torch_npu与torch_cuda环境冲突导致包导入失败

GioGioBond opened this issue · comments

检查清单

  • 1. 我已经搜索过相关问题,但未能获得预期的帮助
  • 2. 该问题在最新版本中尚未修复
  • 3. 请注意,如果您提交的BUG相关 issue 缺少对应环境信息和最小可复现示例,我们将难以复现和定位问题,降低获得反馈的可能性
  • 4. 如果您提出的不是bug而是问题,请在讨论区发起讨论 https://github.com/kvcache-ai/ktransformers/discussions。否则该 issue 将被关闭
  • 5. 为方便社区交流,我将使用中文/英文或附上中文/英文翻译(如使用其他语言)。未附带翻译的非中文/英语内容可能会被关闭

问题描述

显卡A800 80g
环境:
Python 3.11.14
ktransformers 0.3.2+cu126torch29fancy
torch 2.9.0+cu126
torchaudio 2.9.0+cu126
torchvision 0.24.0+cu126
分支是20251103 16:00左右拉取,先后两次拉取最新代码,仍然无法运行,报错没有setup_model_parallel,报错信息如下:

(ktransformers)_** user@user-R8488-G12:/data/wmh/ktransformers$ python -m ktransformers.local_chat --model_path /data/wmh/model/qwen14b --gguf_path ./qwen14b-GGUF
no balance_serve
2025-11-03 17:04:55,768 - INFO - flashinfer.jit: Prebuilt kernels not found, using JIT backend
found flashinfer
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/data/wmh/ktransformers/ktransformers/local_chat.py", line 258, in
fire.Fire(local_chat)
File "/home/user/miniconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/miniconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/home/user/miniconda3/envs/ktransformers/lib/python3.11/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/data/wmh/ktransformers/ktransformers/local_chat.py", line 92, in local_chat
local_rank, world_size = setup_model_parallel(tp=tp)
^^^^^^^^^^^^^^^^^^^^
NameError: name 'setup_model_parallel' is not defined

经过排查,local_chat.py中,第16行import torch_npu报错,被异常捕捉跳过了setup_model_parallel导入,对于import torch_npu报错信息如下:

Traceback (most recent call last):
File "/data/wmh/ktransformers/ktransformers/local_chat.py", line 18, in
from ktransformers.util.ascend.ascend_utils import get_absort_weight, setup_model_parallel, get_tensor_parallel_group
File "/data/wmh/ktransformers/ktransformers/util/ascend/ascend_utils.py", line 5, in
import torch_npu
ModuleNotFoundError: No module named 'torch_npu'

对于torch_npu,搜索得知这个是华为显卡的torch,跟cuda版本torch是冲突的, 现在卡在这里了

复现步骤

python -m ktransformers.local_chat --model_path /data/wmh/model/qwen14b --gguf_path ./qwen14b-GGUF

用的模型是qwen2.5 14b

环境信息

==================== 系统信息 ====================
操作系统: Ubuntu 22.04.2 LTS
内核版本: 6.8.0-60-generic
架构: x86_64

==================== CPU 信息 ====================
Model name: Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 2
逻辑 CPU 核心数: 128

==================== GPU 信息 ====================
NVIDIA A800-SXM4-80GB, 560.35.05, 81920, 4, 16

我也遇到了这个问题 哥们 解决了吗?