[Bug] 编译后启动服务失败,提示AttributeError: 'Engine' object has no attribute 'model_runner'
CodeZ-Hao opened this issue · comments
检查清单
- 1. 我已经搜索过相关问题,但未能获得预期的帮助
- 2. 该问题在最新版本中尚未修复
- 3. 请注意,如果您提交的BUG相关 issue 缺少对应环境信息和最小可复现示例,我们将难以复现和定位问题,降低获得反馈的可能性
- 4. 如果您提出的不是bug而是问题,请在讨论区发起讨论 https://github.com/kvcache-ai/ktransformers/discussions。否则该 issue 将被关闭
- 5. 为方便社区交流,我将使用中文/英文或附上中文/英文翻译(如使用其他语言)。未附带翻译的非中文/英语内容可能会被关闭
问题描述
拉取最新提交代码编译后运行失败,相同环境、配置、命令,在拉取前可以运行成功。
异常信息如下:
loading model.layers.58.post_attention_layernorm.weight to cuda
loading model.layers.59.self_attn.q_norm.weight to cuda
loading model.layers.59.self_attn.k_norm.weight to cuda
loading model.layers.59.input_layernorm.weight to cuda
loading model.layers.59.post_attention_layernorm.weight to cuda
loading model.layers.60.self_attn.q_norm.weight to cuda
loading model.layers.60.self_attn.k_norm.weight to cuda
loading model.layers.60.input_layernorm.weight to cuda
loading model.layers.60.post_attention_layernorm.weight to cuda
loading model.layers.61.self_attn.q_norm.weight to cuda
loading model.layers.61.self_attn.k_norm.weight to cuda
loading model.layers.61.input_layernorm.weight to cuda
loading model.layers.61.post_attention_layernorm.weight to cuda
loading model.norm.weight to cuda
Getting inference context from sched_client.
sched_rpc started with PID: 969719
Got inference context, sending it to subscribers.
Rebuilding kvcache
62
kv_cache loaded successfully.
Process SpawnProcess-1:
Traceback (most recent call last):
File "/root/miniconda3/envs/kt/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/root/miniconda3/envs/kt/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 403, in run_engine
engine = Engine(args, token_queue, broadcast_endpoint, kvcache_event)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/miniconda3/envs/kt/lib/python3.11/site-packages/ktransformers/server/backend/interfaces/balance_serve.py", line 280, in __init__
self.model.init_wrapper(self.args.use_cuda_graph, self.device, max(self.model_runner.cuda_graphs), args.max_batch_size, self.block_num)
^^^^^^^^^^^^^^^^^
AttributeError: 'Engine' object has no attribute 'model_runner'
^CReceived signal 2, shutting down...
Cleaning up...
Terminating sched_process 969719
/root/miniconda3/envs/kt/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 13 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
复现步骤
- 拉取最新代码,执行MAX_JOBS=44 USE_BALANCE_SERVE=1 bash ./install.sh完成编译;
- 执行启动命令:python ktransformers/server/main.py --port 6399 --architectures Qwen3MoeForCausalLM --model_path /work/ktransformers/models/Q3Config/Qwen3-Coder-480B-A35B-Instruct/ --gguf_path /models/Qwen3/Qwen3-Coder-480B-A35B/ --model_name qwen3-coder --optimize_config_path ktransformers/optimize/optimize_rules/Qwen3Moe-serve.yaml --max_new_tokens 8192 --cache_lens 49152 --chunk_size 512 --cache_q4 true --max_batch_size 1 --backend_type balance_serve
- 提示异常
环境信息
Intel(R) Xeon(R) Platinum 8461V + 3090 24GB
Qwen3Moe-serve.yaml文件未做修改,未开启AMX,未开启前缀缓存
代码版本: main分支,commit eb9008a
我在启动QWEN3 235B A22 GGUF的时候也是同样的问题。