usyd-fsalab / fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

build Qwen/Qwen1.5-14B-Chat error

yoke233 opened this issue · comments

commented
>>> mii.pipeline("Qwen/Qwen1.5-14B-Chat", quantization_mode='wf6af16')
Fetching 14 files: 100%|███████████████████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 53723.93it/s]
[2024-03-13 19:07:07,296] [INFO] [engine_v2.py:82:__init__] Building model...
[2024-03-13 19:07:07,307] [INFO] [huggingface_engine.py:109:parameters] Loading checkpoint: /data/xinference/hf/hub/models--Qwen--Qwen1.5-14B-Chat/snapshots/17e11c306ed235e970c9bb8e5f7233527140cdcf/model-00008-of-00008.safetensors
[2024-03-13 19:07:07,563] [INFO] [huggingface_engine.py:109:parameters] Loading checkpoint: /data/xinference/hf/hub/models--Qwen--Qwen1.5-14B-Chat/snapshots/17e11c306ed235e970c9bb8e5f7233527140cdcf/model-00002-of-00008.safetensors
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/data/anaconda/envs/mii/lib/python3.10/site-packages/mii/api.py", line 207, in pipeline
    inference_engine = load_model(model_config)
  File "/data/anaconda/envs/mii/lib/python3.10/site-packages/mii/modeling/models.py", line 17, in load_model
    inference_engine = build_hf_engine(
  File "/data/anaconda/envs/mii/lib/python3.10/site-packages/deepspeed/inference/v2/engine_factory.py", line 129, in build_hf_engine
    return InferenceEngineV2(policy, engine_config)
  File "/data/anaconda/envs/mii/lib/python3.10/site-packages/deepspeed/inference/v2/engine_v2.py", line 83, in __init__
    self._model = self._policy.build_model(self._config, self._base_mp_group)
  File "/data/anaconda/envs/mii/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 157, in build_model
    self.populate_model_parameters()
  File "/data/anaconda/envs/mii/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 199, in populate_model_parameters
    container_map.map_param(name, parameter)
  File "/data/anaconda/envs/mii/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_policy_base.py", line 73, in map_param
    self._transformer_params[layer_idx].set_dependency(".".join(popped_name.split(".")[1:]), parameter)
  File "/data/anaconda/envs/mii/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/layer_container_base.py", line 318, in set_dependency
    setattr(target_param, target_dependency_name, dep_value)
  File "/data/anaconda/envs/mii/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/parameter_base.py", line 38, in param_setter
    self.complete_component()
  File "/data/anaconda/envs/mii/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/parameter_base.py", line 163, in complete_component
    finalized_param = self.finalize()
  File "/data/anaconda/envs/mii/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/common_parameters/mlp_parameters.py", line 81, in finalize
    return self.inference_model.transform_mlp_2_param(self.params)
  File "/data/anaconda/envs/mii/lib/python3.10/site-packages/deepspeed/inference/v2/model_implementations/inference_transformer_base.py", line 497, in transform_mlp_2_param
    param = self.mlp_2.transform_param(param)
  File "/data/anaconda/envs/mii/lib/python3.10/site-packages/deepspeed/inference/v2/modules/implementations/linear/quantized_linear.py", line 172, in transform_param
    quantized_fake_fp6, scales = self.quantizer(param, num_bits=6, exp_bits=3)
  File "/data/anaconda/envs/mii/lib/python3.10/site-packages/deepspeed/inference/v2/modules/implementations/linear/quantized_linear.py", line 58, in fp_quantize
    assert input.dtype == torch.float16
AssertionError

Sorry for the late response. I think more engineering efforts are required to support more models in DeepSpeed.