[Bug] 编译后启动服务失败，提示AttributeError: 'Engine' object has no attribute 'model_runner'

Question

[Bug] 编译后启动服务失败，提示AttributeError: 'Engine' object has no attribute 'model_runner'

CodeZ-Hao opened this issue 11 days ago · comments

CodeZ-Hao commented 11 days ago

检查清单

1. 我已经搜索过相关问题，但未能获得预期的帮助
2. 该问题在最新版本中尚未修复
3. 请注意，如果您提交的BUG相关 issue 缺少对应环境信息和最小可复现示例，我们将难以复现和定位问题，降低获得反馈的可能性
4. 如果您提出的不是bug而是问题，请在讨论区发起讨论 https://github.com/kvcache-ai/ktransformers/discussions。否则该 issue 将被关闭
5. 为方便社区交流，我将使用中文/英文或附上中文/英文翻译（如使用其他语言）。未附带翻译的非中文/英语内容可能会被关闭

问题描述

拉取最新提交代码编译后运行失败，相同环境、配置、命令，在拉取前可以运行成功。
异常信息如下：

loading model.layers.58.post_attention_layernorm.weight to cuda
loading model.layers.59.self_attn.q_norm.weight to cuda
loading model.layers.59.self_attn.k_norm.weight to cuda
loading model.layers.59.input_layernorm.weight to cuda
loading model.layers.59.post_attention_layernorm.weight to cuda
loading model.layers.60.self_attn.q_norm.weight to cuda
loading model.layers.60.self_attn.k_norm.weight to cuda
loading model.layers.60.input_layernorm.weight to cuda
loading model.layers.60.post_attention_layernorm.weight to cuda
loading model.layers.61.self_attn.q_norm.weight to cuda
loading model.layers.61.self_attn.k_norm.weight to cuda
loading model.layers.61.input_layernorm.weight to cuda
loading model.layers.61.post_attention_layernorm.weight to cuda
loading model.norm.weight to cuda
Getting inference context from sched_client.
sched_rpc started with PID: 969719
Got inference context, sending it to subscribers.
Rebuilding kvcache
62
kv_cache loaded successfully.
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/root/miniconda3/envs/kt/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/kt/lib/python3.11/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 403, in run_engine
    engine = Engine(args, token_queue, broadcast_endpoint, kvcache_event)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 280, in __init__
    self.model.init_wrapper(self.args.use_cuda_graph, self.device, max(self.model_runner.cuda_graphs), args.max_batch_size, self.block_num)
                                                                      ^^^^^^^^^^^^^^^^^
AttributeError: 'Engine' object has no attribute 'model_runner'
^CReceived signal 2, shutting down...
Cleaning up...
Terminating sched_process 969719
/root/miniconda3/envs/kt/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 13 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

复现步骤

拉取最新代码，执行MAX_JOBS=44 USE_BALANCE_SERVE=1 bash ./install.sh完成编译；
执行启动命令：python ktransformers/server/main.py --port 6399 --architectures Qwen3MoeForCausalLM --model_path /work/ktransformers/models/Q3Config/Qwen3-Coder-480B-A35B-Instruct/ --gguf_path /models/Qwen3/Qwen3-Coder-480B-A35B/ --model_name qwen3-coder --optimize_config_path ktransformers/optimize/optimize_rules/Qwen3Moe-serve.yaml --max_new_tokens 8192 --cache_lens 49152 --chunk_size 512 --cache_q4 true --max_batch_size 1 --backend_type balance_serve
提示异常

环境信息

Intel(R) Xeon(R) Platinum 8461V + 3090 24GB
Qwen3Moe-serve.yaml文件未做修改，未开启AMX，未开启前缀缓存
代码版本: main分支，commit eb9008a

雷远洋 · Answer 1 · Thu Oct 30 2025 22:46:31 GMT+0800 (China Standard Time)

我在启动QWEN3 235B A22 GGUF的时候也是同样的问题。