[ERROR]: RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

Question

[ERROR]: RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

OualidBougzime opened this issue a month ago · comments

Describe the issue

When attempting to run inference with my fine-tuned LLaVA model using LoRA, I encountered an error. Here's the code snippet I used:

from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path
from llava.eval.run_llava import eval_model

# Path to your fine-tuned model
fine_tuned_model_path = "../merged_model_llava_lora"

# Load the fine-tuned model
tokenizer, model, image_processor, context_len = load_pretrained_model(
    model_path=fine_tuned_model_path,
    model_base=None,  # Adjust if necessary based on your training configuration
    model_name=get_model_name_from_path(fine_tuned_model_path)
)

# Evaluation setup
prompt = "Identify the most relevant subclasses within the field of additive manufacturing or 4D Printing based on the given class: \"print_technology\", supplementary context, and image analysis.\n    \n    <|user|>: Parent class: \"print_technology\"\nThe Supplementary context:\n['Fused deposition modeling (FDM) remains the most common 3D printing technology due to its cost-effectiveness and material versatility.', 'Selective laser sintering (SLS) allows for the creation of complex geometries without the need for support structures, enhancing design freedom.', 'Stereolithography (SLA) is renowned for its high resolution and surface finish, making it ideal for detailed prototypes.', 'The emergence of digital light processing (DLP) has improved the speed of the photopolymerization process, significantly reducing printing time.', 'Multi-jet fusion (MJF) offers improved mechanical properties and uniformity compared to traditional layer-based printing methods.']['The application of continuous fiber fabrication (CFF) technology in 3D printing enables the production of parts with enhanced structural integrity.', 'Binder jetting technology is being explored for its potential in mass production due to its ability to rapidly produce multiple parts simultaneously.', 'The development of hybrid printing technologies that combine additive and subtractive processes could revolutionize production efficiency.', 'Advancements in direct energy deposition (DED) technology allow for the repair of high-value components in aerospace and defense industries.', 'Electron beam melting (EBM) technology provides unique advantages in the processing of high-strength titanium alloys for medical implants.']\n    <|assistant|>"
image_file = "../Dondl et al. - 2019 - Simultaneous elastic shape optimization for a doma_image_2.jpg"

# Set up evaluation arguments
args = type('Args', (), {
    "model_path": fine_tuned_model_path,
    "model_base": None,
    "model_name": get_model_name_from_path(fine_tuned_model_path),
    "query": prompt,
    "conv_mode": None,
    "image_file": image_file,
    "sep": ",",
    "temperature": 0,
    "top_p": None,
    "num_beams": 1,
    "max_new_tokens": 512
})()

# Perform evaluation with the fine-tuned model
eval_model(args)

The error message I received is as follows:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[36], line 35
     20 args = type('Args', (), {
     21     "model_path": fine_tuned_model_path,
     22     "model_base": None,
   (...)
     31     "max_new_tokens": 512
     32 })()
     34 # Perform evaluation with the fine-tuned model
---> 35 eval_model(args)

File ~/FineTuneVLLM/FineTune/LLaVA/llava/eval/run_llava.py:115, in eval_model(args)
    108 input_ids = (
    109     tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt")
    110     .unsqueeze(0)
    111     .cuda()
    112 )
    114 with torch.inference_mode():
--> 115     output_ids = model.generate(
    116         input_ids,
    117         images=images_tensor,
    118         image_sizes=image_sizes,
    119         do_sample=True if args.temperature > 0 else False,
    120         temperature=args.temperature,
    121         top_p=args.top_p,
    122         num_beams=args.num_beams,
    123         max_new_tokens=args.max_new_tokens,
    124         use_cache=True,
    125     )
    127 outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0].strip()
    128 print(outputs)

File /usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~/FineTuneVLLM/FineTune/LLaVA/llava/model/language_model/llava_llama.py:125, in LlavaLlamaForCausalLM.generate(self, inputs, images, image_sizes, **kwargs)
    115     raise NotImplementedError("`inputs_embeds` is not supported")
    117 if images is not None:
    118     (
    119         inputs,
    120         position_ids,
    121         attention_mask,
    122         _,
    123         inputs_embeds,
    124         _
--> 125     ) = self.prepare_inputs_labels_for_multimodal(
    126         inputs,
    127         position_ids,
    128         attention_mask,
    129         None,
    130         None,
    131         images,
    132         image_sizes=image_sizes
    133     )
    134 else:
    135     inputs_embeds = self.get_model().embed_tokens(inputs)

File ~/FineTuneVLLM/FineTune/LLaVA/llava/model/llava_arch.py:202, in LlavaMetaForCausalLM.prepare_inputs_labels_for_multimodal(self, input_ids, position_ids, attention_mask, past_key_values, labels, images, image_sizes)
    200         raise ValueError(f"Unexpected mm_patch_merge_type: {self.config.mm_patch_merge_type}")
    201 else:
--> 202     image_features = self.encode_images(images)
    204 # TODO: image start / end is not implemented here to support pretraining.
    205 if getattr(self.config, 'tune_mm_mlp_adapter', False) and getattr(self.config, 'mm_use_im_start_end', False):

File ~/FineTuneVLLM/FineTune/LLaVA/llava/model/llava_arch.py:142, in LlavaMetaForCausalLM.encode_images(self, images)
    140 def encode_images(self, images):
    141     image_features = self.get_model().get_vision_tower()(images)
--> 142     image_features = self.get_model().mm_projector(image_features)
    143     return image_features

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/container.py:215, in Sequential.forward(self, input)
    213 def forward(self, input):
    214     for module in self:
--> 215         input = module(input)
    216     return input

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
   1516     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]
   1517 else:
-> 1518     return self._call_impl(*args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
   1522 # If we don't have any hooks, we want to skip the rest of the logic in
   1523 # this function, and just call forward.
   1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1525         or _global_backward_pre_hooks or _global_backward_hooks
   1526         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527     return forward_call(*args, **kwargs)
   1529 try:
   1530     result = None

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
    113 def forward(self, input: Tensor) -> Tensor:
--> 114     return F.linear(input, self.weight, self.bias)

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

Could someone assist me with resolving this issue?

Cristian Gutiérrez · Answer 1 · Fri May 24 2024 19:52:26 GMT+0800 (China Standard Time)

Move the model to CUDA, you are probably using float 16 which is only implemented for GPUs.