Command line usage

Question

Command line usage

Trouble123 opened this issue a month ago · comments

Hi,
I am new to local LLM and really intrested in getting MiniCPM to work from a command line. Can anyone share how todo this, or even point me int he right direction?

Thanks

tc-mb · Answer 1 · Thu Aug 14 2025 13:24:20 GMT+0800 (China Standard Time)

If you're just starting out, I recommend using our cookbook to learn how to use it.
https://github.com/OpenSQZ/MiniCPM-V-CookBook

If you want to use it locally, you can run it using llama.cpp. Refer to this documentation.
https://minicpm-o.readthedocs.io/zh-cn/latest/run_locally/llama.cpp.html

Trouble123 · Answer 2 · Fri Aug 15 2025 05:57:30 GMT+0800 (China Standard Time)

Thankyou, that's very helpful.

With the example of using llama.ccp, I only see --image as a source, I am looking use this for video.
Is there a --video option?

Thanks

tc-mb · Answer 3 · Fri Aug 15 2025 11:51:35 GMT+0800 (China Standard Time)

Yes, that's right. The llama.cpp framework doesn't yet support video understanding. You can use vllm for video understanding. The documentation is below.
https://minicpm-o.readthedocs.io/en/latest/deployment/vllm.html

Trouble123 · Answer 4 · Sun Aug 17 2025 14:59:37 GMT+0800 (China Standard Time)

Hi, i spun up a Digital Ocean droplet which ws setup for vLLM, and then ran the second command (vLLM was already installed)
When i ran the 3rd command (added --port 8888) i get this error. I tried changing the model len to 1024, same issue.

the Droplet has an RX6000 with 48Gb of VRAM and 64Gb of RAM
vllm serve model/ --dtype auto --max-model-len 2048 --api-key token-abc123 --gpu_memory_utilization 0.9 --trust-remote-code --port 8888 INFO 08-17 06:44:40 [__init__.py:235] Automatically detected platform cuda. INFO 08-17 06:44:42 [api_server.py:1755] vLLM API server version 0.10.0 INFO 08-17 06:44:42 [cli_args.py:261] non-default args: {'model_tag': 'model/', 'port': 8888, 'api_key': 'token-abc123', 'model': 'model/', 'trust_remote_code': True, 'max_model_len': 2048} INFO 08-17 06:44:46 [config.py:1604] Using max model len 2048 INFO 08-17 06:44:46 [config.py:2434] Chunked prefill is enabled with max_num_batched_tokens=2048. INFO 08-17 06:44:49 [__init__.py:235] Automatically detected platform cuda. INFO 08-17 06:44:50 [core.py:572] Waiting for init message from front-end. INFO 08-17 06:44:50 [core.py:71] Initializing a V1 LLM engine (v0.10.0) with config: model='model/', speculative_config=None, tokenizer='model/', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=model/, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":512,"local_cache_dir":null} INFO 08-17 06:44:52 [parallel_state.py:1102] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 <unknown>:313: SyntaxWarning: invalid escape sequence '\(' <unknown>:315: SyntaxWarning: invalid escape sequence '\(' /root/.cache/huggingface/modules/transformers_modules/processing_minicpmo.py:313: SyntaxWarning: invalid escape sequence '\(' image_pattern = "\(<image>./</image>\)" /root/.cache/huggingface/modules/transformers_modules/processing_minicpmo.py:315: SyntaxWarning: invalid escape sequence '\(' audio_pattern = "\(<audio>./</audio>\)" /root/.miniconda3/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py:640: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please useslow_image_processor_class, or fast_image_processor_classinstead warnings.warn( Using a slow image processor asuse_fastis unset and a slow processor was saved with this model.use_fast=Truewill be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor withuse_fast=False`.
:313: SyntaxWarning: invalid escape sequence '('
:315: SyntaxWarning: invalid escape sequence '('
:313: SyntaxWarning: invalid escape sequence '('
:315: SyntaxWarning: invalid escape sequence '('
WARNING 08-17 06:44:57 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
INFO 08-17 06:44:57 [gpu_model_runner.py:1843] Starting to load model model/...
INFO 08-17 06:44:57 [gpu_model_runner.py:1875] Loading model from scratch...
INFO 08-17 06:44:57 [cuda.py:290] Using Flash Attention backend on V1 engine.
INFO 08-17 06:44:57 [cuda.py:307] Using FlexAttention backend for head_size=72 on V1 engine.
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:16<00:48, 16.30s/it]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:42<00:43, 21.89s/it]
Loading safetensors checkpoint shards: 75% Completed | 3/4 [01:07<00:23, 23.64s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [01:30<00:00, 23.19s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [01:30<00:00, 22.58s/it]

INFO 08-17 06:46:28 [default_loader.py:262] Loading weights took 90.37 seconds
INFO 08-17 06:46:28 [gpu_model_runner.py:1892] Model loading took 15.7985 GiB and 90.706682 seconds
INFO 08-17 06:46:28 [gpu_model_runner.py:2380] Encoder cache will be initialized with a budget of 2048 tokens, and profiled with 2 audio items of the maximum feature size.
ERROR 08-17 06:46:28 [core.py:632] EngineCore failed to start.
ERROR 08-17 06:46:28 [core.py:632] Traceback (most recent call last):
ERROR 08-17 06:46:28 [core.py:632] File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 623, in run_engine_core
ERROR 08-17 06:46:28 [core.py:632] engine_core = EngineCoreProc(*args, **kwargs)
ERROR 08-17 06:46:28 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-17 06:46:28 [core.py:632] File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 441, in init
ERROR 08-17 06:46:28 [core.py:632] super().init(vllm_config, executor_class, log_stats,
ERROR 08-17 06:46:28 [core.py:632] File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 86, in init
ERROR 08-17 06:46:28 [core.py:632] self._initialize_kv_caches(vllm_config)
ERROR 08-17 06:46:28 [core.py:632] File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 158, in _initialize_kv_caches
ERROR 08-17 06:46:28 [core.py:632] self.model_executor.determine_available_memory())
ERROR 08-17 06:46:28 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-17 06:46:28 [core.py:632] File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 76, in determine_available_memory
ERROR 08-17 06:46:28 [core.py:632] output = self.collective_rpc("determine_available_memory")
ERROR 08-17 06:46:28 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-17 06:46:28 [core.py:632] File "/root/.miniconda3/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
ERROR 08-17 06:46:28 [core.py:632] answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 08-17 06:46:28 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-17 06:46:28 [core.py:632] File "/root/.miniconda3/lib/python3.12/site-packages/vllm/utils/init.py", line 2985, in run_method
ERROR 08-17 06:46:28 [core.py:632] return func(*args, **kwargs)
ERROR 08-17 06:46:28 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^
ERROR 08-17 06:46:28 [core.py:632] File "/root/.miniconda3/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 08-17 06:46:28 [core.py:632] return func(*args, **kwargs)
ERROR 08-17 06:46:28 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^
ERROR 08-17 06:46:28 [core.py:632] File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 233, in determine_available_memory
ERROR 08-17 06:46:28 [core.py:632] self.model_runner.profile_run()
ERROR 08-17 06:46:28 [core.py:632] File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2403, in profile_run
ERROR 08-17 06:46:28 [core.py:632] dummy_encoder_outputs = self.model.get_multimodal_embeddings(
ERROR 08-17 06:46:28 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-17 06:46:28 [core.py:632] File "/root/.miniconda3/lib/python3.12/site-packages/vllm/model_executor/models/minicpmv.py", line 906, in get_multimodal_embeddings
ERROR 08-17 06:46:28 [core.py:632] return self._process_multimodal_inputs(modalities)
ERROR 08-17 06:46:28 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-17 06:46:28 [core.py:632] File "/root/.miniconda3/lib/python3.12/site-packages/vllm/model_executor/models/minicpmo.py", line 770, in _process_multimodal_inputs
ERROR 08-17 06:46:28 [core.py:632] audio_features = self._process_audio_input(audio_input)
ERROR 08-17 06:46:28 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-17 06:46:28 [core.py:632] File "/root/.miniconda3/lib/python3.12/site-packages/vllm/model_executor/models/minicpmo.py", line 762, in _process_audio_input
ERROR 08-17 06:46:28 [core.py:632] return self.get_audio_hidden_states(audio_input)
ERROR 08-17 06:46:28 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-17 06:46:28 [core.py:632] File "/root/.miniconda3/lib/python3.12/site-packages/vllm/model_executor/models/minicpmo.py", line 672, in get_audio_hidden_states
ERROR 08-17 06:46:28 [core.py:632] audio_states = self.apm(
ERROR 08-17 06:46:28 [core.py:632] ^^^^^^^^^
ERROR 08-17 06:46:28 [core.py:632] File "/root/.miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 08-17 06:46:28 [core.py:632] return self._call_impl(*args, **kwargs)
ERROR 08-17 06:46:28 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-17 06:46:28 [core.py:632] File "/root/.miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 08-17 06:46:28 [core.py:632] return forward_call(*args, **kwargs)
ERROR 08-17 06:46:28 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-17 06:46:28 [core.py:632] File "/root/.miniconda3/lib/python3.12/site-packages/vllm/model_executor/models/minicpmo.py", line 482, in forward
ERROR 08-17 06:46:28 [core.py:632] layer_outputs = encoder_layer(
ERROR 08-17 06:46:28 [core.py:632] ^^^^^^^^^^^^^^
ERROR 08-17 06:46:28 [core.py:632] File "/root/.miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 08-17 06:46:28 [core.py:632] return self._call_impl(*args, **kwargs)
ERROR 08-17 06:46:28 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-17 06:46:28 [core.py:632] File "/root/.miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 08-17 06:46:28 [core.py:632] return forward_call(*args, **kwargs)
ERROR 08-17 06:46:28 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-17 06:46:28 [core.py:632] File "/root/.miniconda3/lib/python3.12/site-packages/vllm/model_executor/models/minicpmo.py", line 406, in forward
ERROR 08-17 06:46:28 [core.py:632] hidden_states, attn_weights, past_key_values = self.self_attn(
ERROR 08-17 06:46:28 [core.py:632] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 08-17 06:46:28 [core.py:632] ValueError: not enough values to unpack (expected 3, got 2)
Process EngineCore_0:
Traceback (most recent call last):
File "/root/.miniconda3/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/root/.miniconda3/lib/python3.12/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 636, in run_engine_core
raise e
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 623, in run_engine_core
engine_core = EngineCoreProc(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 441, in init
super().init(vllm_config, executor_class, log_stats,
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 86, in init
self._initialize_kv_caches(vllm_config)
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 158, in _initialize_kv_caches
self.model_executor.determine_available_memory())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 76, in determine_available_memory
output = self.collective_rpc("determine_available_memory")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 58, in collective_rpc
answer = run_method(self.driver_worker, method, args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/utils/init.py", line 2985, in run_method
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 233, in determine_available_memory
self.model_runner.profile_run()
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2403, in profile_run
dummy_encoder_outputs = self.model.get_multimodal_embeddings(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/model_executor/models/minicpmv.py", line 906, in get_multimodal_embeddings
return self._process_multimodal_inputs(modalities)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/model_executor/models/minicpmo.py", line 770, in _process_multimodal_inputs
audio_features = self._process_audio_input(audio_input)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/model_executor/models/minicpmo.py", line 762, in _process_audio_input
return self.get_audio_hidden_states(audio_input)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/model_executor/models/minicpmo.py", line 672, in get_audio_hidden_states
audio_states = self.apm(
^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/model_executor/models/minicpmo.py", line 482, in forward
layer_outputs = encoder_layer(
^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/model_executor/models/minicpmo.py", line 406, in forward
hidden_states, attn_weights, past_key_values = self.self_attn(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: not enough values to unpack (expected 3, got 2)
[rank0]:[W817 06:46:29.683525442 ProcessGroupNCCL.cpp:1479] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Traceback (most recent call last):
File "/root/.miniconda3/bin/vllm", line 8, in
sys.exit(main())
^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 54, in main
args.dispatch_function(args)
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 52, in cmd
uvloop.run(run_server(args))
File "/root/.miniconda3/lib/python3.12/site-packages/uvloop/init.py", line 109, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/asyncio/runners.py", line 195, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/root/.miniconda3/lib/python3.12/site-packages/uvloop/init.py", line 61, in wrapper
return await main
^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1791, in run_server
await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1811, in run_server_worker
async with build_async_engine_client(args, client_config) as engine_client:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 158, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 194, in build_async_engine_client_from_engine_args
async_llm = AsyncLLM.from_vllm_config(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 163, in from_vllm_config
return cls(
^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 117, in init
self.engine_core = EngineCoreClient.make_async_mp_client(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 98, in make_async_mp_client
return AsyncMPClient(*client_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 677, in init
super().init(
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 408, in init
with launch_core_engines(vllm_config, executor_class,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.miniconda3/lib/python3.12/contextlib.py", line 144, in exit
next(self.gen)
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 697, in launch_core_engines
wait_for_engine_startup(
File "/root/.miniconda3/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 750, in wait_for_engine_startup
raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
`

Trouble123 · Answer 5 · Thu Aug 21 2025 14:16:43 GMT+0800 (China Standard Time)

Anyone able to help?

tc-mb · Answer 6 · Thu Aug 21 2025 14:20:39 GMT+0800 (China Standard Time)

Anyone able to help?

It should be that the transformers version is relatively low. Check whether you have updated to a relatively new transformers version. It is written at the top of the document.

Trouble123 · Answer 7 · Wed Aug 27 2025 08:33:23 GMT+0800 (China Standard Time)

Thanks for that, but i seem to be hitting a compatability issue.

In the vLLM doc (https://minicpm-o.readthedocs.io/en/latest/deployment/vllm.html) it states i need
to
pip install vllm==0.10.1
pip install vllm[video]

yet installing vllm 0.10.1 upgrades trannsformers to 4.55.0. But if i remove transformers and try to install transformers==4.44.2 it doesnt due to compatability

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
vllm 0.10.1 requires tokenizers>=0.21.1, but you have tokenizers 0.19.1 which is incompatible.
vllm 0.10.1 requires transformers>=4.55.0, but you have transformers 4.44.2 which is incompatible.

I downgraded to vllm 0.10.0 - pretty similar issue
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
vllm 0.10.0 requires tokenizers>=0.21.1, but you have tokenizers 0.19.1 which is incompatible.
vllm 0.10.0 requires transformers>=4.53.2, but you have transformers 4.44.2 which is incompatible.

So i uninstalled vllm and transformers, copied the requirements_o2.6.txt file, and added these lines
transformers==4.44.2
vllm
vllm[video]

and did a pip install -r on this file.

and the end of it i see this
`Collecting vllm (from -r req.txt (line 3))
Downloading vllm-0.5.3.post1-cp38-abi3-manylinux1_x86_64.whl.metadata (1.8 kB)
Collecting xformers==0.0.27 (from vllm->-r req.txt (line 3))
Downloading xformers-0.0.27-cp312-cp312-manylinux2014_x86_64.whl.metadata (1.0 kB)
Collecting vllm (from -r req.txt (line 3))
Downloading vllm-0.5.3-cp38-abi3-manylinux1_x86_64.whl.metadata (1.8 kB)
Downloading vllm-0.5.2-cp38-abi3-manylinux1_x86_64.whl.metadata (1.8 kB)
Downloading vllm-0.5.1.tar.gz (790 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 790.6/790.6 kB 29.7 MB/s eta 0:00:00
p Installing build dependencies ... -|-|lidone
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting lm-format-enforcer==0.10.1 (from vllm->-r req.txt (line 3))
Downloading lm_format_enforcer-0.10.1-py3-none-any.whl.metadata (16 kB)
Collecting outlines>=0.0.43 (from vllm->-r req.txt (line 3))
Downloading outlines-1.2.3-py3-none-any.whl.metadata (28 kB)
Collecting vllm (from -r req.txt (line 3))
Downloading vllm-0.5.0.post1.tar.gz (743 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 743.2/743.2 kB 22.9 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Downloading vllm-0.5.0.tar.gz (726 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 726.3/726.3 kB 27.4 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Downloading vllm-0.4.3.tar.gz (693 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 693.2/693.2 kB 26.3 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting outlines==0.0.34 (from vllm->-r req.txt (line 3))
Downloading outlines-0.0.34-py3-none-any.whl.metadata (13 kB)
Collecting vllm (from -r req.txt (line 3))
Downloading vllm-0.4.2.tar.gz (588 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 588.8/588.8 kB 21.4 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting tiktoken==0.6.0 (from vllm->-r req.txt (line 3))
Downloading tiktoken-0.6.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting lm-format-enforcer==0.9.8 (from vllm->-r req.txt (line 3))
Downloading lm_format_enforcer-0.9.8-py3-none-any.whl.metadata (14 kB)
Collecting vllm-nccl-cu12<2.19,>=2.18 (from vllm->-r req.txt (line 3))
Downloading vllm_nccl_cu12-2.18.1.0.4.0.tar.gz (6.2 kB)
Preparing metadata (setup.py) ... done
Collecting vllm (from -r req.txt (line 3))
Downloading vllm-0.4.1.tar.gz (534 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 534.8/534.8 kB 17.0 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Downloading vllm-0.3.3.tar.gz (315 kB)
Installing build dependencies ... error
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> [8 lines of output]
Collecting ninja
Using cached ninja-1.13.0-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (5.1 kB)
Collecting packaging
Using cached packaging-25.0-py3-none-any.whl.metadata (3.3 kB)
Collecting setuptools>=49.4.0
Using cached setuptools-80.9.0-py3-none-any.whl.metadata (6.6 kB)
ERROR: Could not find a version that satisfies the requirement torch==2.1.2 (from versions: 2.2.0, 2.2.1, 2.2.2, 2.3.0, 2.3.1, 2.4.0, 2.4.1, 2.5.0, 2.5.1, 2.6.0, 2.7.0, 2.7.1, 2.8.0)
ERROR: No matching distribution found for torch==2.1.2
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
`
So i am not sure how someone can get this working with these requirements.

tc-mb · Answer 8 · Wed Aug 27 2025 09:45:34 GMT+0800 (China Standard Time)

I suspect that vllm==0.10.1 indicates that you are viewing the documentation for MiniCPM-V4d, but your model uses MiniCPM-o2.6. This is the cause of the discrepancy. I recommend that you follow the documentation exactly as it is; this is more convenient.