[BUG] tip/main has regression on optional vllm depend

Question

[BUG] tip/main has regression on optional vllm depend

Qubitium opened this issue 3 months ago · comments

Qubitium-ModelCloud commented 3 months ago

vLLM was merged into main but there remains runtime depend issues. vLLM is a large pkg we will not force dependency on it. vLLM should runtime import and error if not exists and prompt users to install.

Reported by @FrederikHandberg

Traceback (most recent call last):
  File "/workspace/GPTQModel/quant.py", line 2, in <module>
    from gptqmodel import GPTQModel, QuantizeConfig
  File "/workspace/GPTQModel/gptqmodel/__init__.py", line 1, in <module>
    from .models import GPTQModel
  File "/workspace/GPTQModel/gptqmodel/models/__init__.py", line 1, in <module>
    from .auto import MODEL_MAP, GPTQModel
  File "/workspace/GPTQModel/gptqmodel/models/auto.py", line 5, in <module>
    from .baichuan import BaiChuanGPTQ
  File "/workspace/GPTQModel/gptqmodel/models/baichuan.py", line 1, in <module>
    from .base import BaseGPTQModel
  File "/workspace/GPTQModel/gptqmodel/models/base.py", line 36, in <module>
    from ..utils.vllm import load_model_by_vllm, vllm_generate
  File "/workspace/GPTQModel/gptqmodel/utils/vllm.py", line 12, in <module>
    def convert_hf_params_to_vllm(hf_params: Dict[str, Any]) -> SamplingParams:
NameError: name 'SamplingParams' is not defined

Qubitium-ModelCloud · Answer 1 · Thu Jul 11 2024 02:35:25 GMT+0800 (China Standard Time)

@FrederikHandberg The vllm issue regression on main should be fixed. We will be working on the 8bit issue next. Please use 4bit for now. Our tests have shown that 4bit is as accurate as 8bit and much more performant.