[BUG] tip/main has regression on optional vllm depend
Qubitium opened this issue · comments
Qubitium-ModelCloud commented
vLLM was merged into main but there remains runtime depend issues. vLLM is a large pkg we will not force dependency on it. vLLM should runtime import and error if not exists and prompt users to install.
Reported by @FrederikHandberg
Traceback (most recent call last):
File "/workspace/GPTQModel/quant.py", line 2, in <module>
from gptqmodel import GPTQModel, QuantizeConfig
File "/workspace/GPTQModel/gptqmodel/__init__.py", line 1, in <module>
from .models import GPTQModel
File "/workspace/GPTQModel/gptqmodel/models/__init__.py", line 1, in <module>
from .auto import MODEL_MAP, GPTQModel
File "/workspace/GPTQModel/gptqmodel/models/auto.py", line 5, in <module>
from .baichuan import BaiChuanGPTQ
File "/workspace/GPTQModel/gptqmodel/models/baichuan.py", line 1, in <module>
from .base import BaseGPTQModel
File "/workspace/GPTQModel/gptqmodel/models/base.py", line 36, in <module>
from ..utils.vllm import load_model_by_vllm, vllm_generate
File "/workspace/GPTQModel/gptqmodel/utils/vllm.py", line 12, in <module>
def convert_hf_params_to_vllm(hf_params: Dict[str, Any]) -> SamplingParams:
NameError: name 'SamplingParams' is not defined
Qubitium-ModelCloud commented
@FrederikHandberg The vllm issue regression on main should be fixed. We will be working on the 8bit issue next. Please use 4bit for now. Our tests have shown that 4bit is as accurate as 8bit and much more performant.