[ERROR]: RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
OualidBougzime opened this issue · comments
OualidBougzime commented
Describe the issue
When attempting to run inference with my fine-tuned LLaVA model using LoRA, I encountered an error. Here's the code snippet I used:
from llava.model.builder import load_pretrained_model
from llava.mm_utils import get_model_name_from_path
from llava.eval.run_llava import eval_model
# Path to your fine-tuned model
fine_tuned_model_path = "../merged_model_llava_lora"
# Load the fine-tuned model
tokenizer, model, image_processor, context_len = load_pretrained_model(
model_path=fine_tuned_model_path,
model_base=None, # Adjust if necessary based on your training configuration
model_name=get_model_name_from_path(fine_tuned_model_path)
)
# Evaluation setup
prompt = "Identify the most relevant subclasses within the field of additive manufacturing or 4D Printing based on the given class: \"print_technology\", supplementary context, and image analysis.\n \n <|user|>: Parent class: \"print_technology\"\nThe Supplementary context:\n['Fused deposition modeling (FDM) remains the most common 3D printing technology due to its cost-effectiveness and material versatility.', 'Selective laser sintering (SLS) allows for the creation of complex geometries without the need for support structures, enhancing design freedom.', 'Stereolithography (SLA) is renowned for its high resolution and surface finish, making it ideal for detailed prototypes.', 'The emergence of digital light processing (DLP) has improved the speed of the photopolymerization process, significantly reducing printing time.', 'Multi-jet fusion (MJF) offers improved mechanical properties and uniformity compared to traditional layer-based printing methods.']['The application of continuous fiber fabrication (CFF) technology in 3D printing enables the production of parts with enhanced structural integrity.', 'Binder jetting technology is being explored for its potential in mass production due to its ability to rapidly produce multiple parts simultaneously.', 'The development of hybrid printing technologies that combine additive and subtractive processes could revolutionize production efficiency.', 'Advancements in direct energy deposition (DED) technology allow for the repair of high-value components in aerospace and defense industries.', 'Electron beam melting (EBM) technology provides unique advantages in the processing of high-strength titanium alloys for medical implants.']\n <|assistant|>"
image_file = "../Dondl et al. - 2019 - Simultaneous elastic shape optimization for a doma_image_2.jpg"
# Set up evaluation arguments
args = type('Args', (), {
"model_path": fine_tuned_model_path,
"model_base": None,
"model_name": get_model_name_from_path(fine_tuned_model_path),
"query": prompt,
"conv_mode": None,
"image_file": image_file,
"sep": ",",
"temperature": 0,
"top_p": None,
"num_beams": 1,
"max_new_tokens": 512
})()
# Perform evaluation with the fine-tuned model
eval_model(args)
The error message I received is as follows:
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[36], line 35
20 args = type('Args', (), {
21 "model_path": fine_tuned_model_path,
22 "model_base": None,
(...)
31 "max_new_tokens": 512
32 })()
34 # Perform evaluation with the fine-tuned model
---> 35 eval_model(args)
File ~/FineTuneVLLM/FineTune/LLaVA/llava/eval/run_llava.py:115, in eval_model(args)
108 input_ids = (
109 tokenizer_image_token(prompt, tokenizer, IMAGE_TOKEN_INDEX, return_tensors="pt")
110 .unsqueeze(0)
111 .cuda()
112 )
114 with torch.inference_mode():
--> 115 output_ids = model.generate(
116 input_ids,
117 images=images_tensor,
118 image_sizes=image_sizes,
119 do_sample=True if args.temperature > 0 else False,
120 temperature=args.temperature,
121 top_p=args.top_p,
122 num_beams=args.num_beams,
123 max_new_tokens=args.max_new_tokens,
124 use_cache=True,
125 )
127 outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0].strip()
128 print(outputs)
File /usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
112 @functools.wraps(func)
113 def decorate_context(*args, **kwargs):
114 with ctx_factory():
--> 115 return func(*args, **kwargs)
File ~/FineTuneVLLM/FineTune/LLaVA/llava/model/language_model/llava_llama.py:125, in LlavaLlamaForCausalLM.generate(self, inputs, images, image_sizes, **kwargs)
115 raise NotImplementedError("`inputs_embeds` is not supported")
117 if images is not None:
118 (
119 inputs,
120 position_ids,
121 attention_mask,
122 _,
123 inputs_embeds,
124 _
--> 125 ) = self.prepare_inputs_labels_for_multimodal(
126 inputs,
127 position_ids,
128 attention_mask,
129 None,
130 None,
131 images,
132 image_sizes=image_sizes
133 )
134 else:
135 inputs_embeds = self.get_model().embed_tokens(inputs)
File ~/FineTuneVLLM/FineTune/LLaVA/llava/model/llava_arch.py:202, in LlavaMetaForCausalLM.prepare_inputs_labels_for_multimodal(self, input_ids, position_ids, attention_mask, past_key_values, labels, images, image_sizes)
200 raise ValueError(f"Unexpected mm_patch_merge_type: {self.config.mm_patch_merge_type}")
201 else:
--> 202 image_features = self.encode_images(images)
204 # TODO: image start / end is not implemented here to support pretraining.
205 if getattr(self.config, 'tune_mm_mlp_adapter', False) and getattr(self.config, 'mm_use_im_start_end', False):
File ~/FineTuneVLLM/FineTune/LLaVA/llava/model/llava_arch.py:142, in LlavaMetaForCausalLM.encode_images(self, images)
140 def encode_images(self, images):
141 image_features = self.get_model().get_vision_tower()(images)
--> 142 image_features = self.get_model().mm_projector(image_features)
143 return image_features
File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1517 else:
-> 1518 return self._call_impl(*args, **kwargs)
File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
1522 # If we don't have any hooks, we want to skip the rest of the logic in
1523 # this function, and just call forward.
1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1525 or _global_backward_pre_hooks or _global_backward_hooks
1526 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527 return forward_call(*args, **kwargs)
1529 try:
1530 result = None
File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/container.py:215, in Sequential.forward(self, input)
213 def forward(self, input):
214 for module in self:
--> 215 input = module(input)
216 return input
File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1518, in Module._wrapped_call_impl(self, *args, **kwargs)
1516 return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc]
1517 else:
-> 1518 return self._call_impl(*args, **kwargs)
File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1527, in Module._call_impl(self, *args, **kwargs)
1522 # If we don't have any hooks, we want to skip the rest of the logic in
1523 # this function, and just call forward.
1524 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
1525 or _global_backward_pre_hooks or _global_backward_hooks
1526 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1527 return forward_call(*args, **kwargs)
1529 try:
1530 result = None
File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
113 def forward(self, input: Tensor) -> Tensor:
--> 114 return F.linear(input, self.weight, self.bias)
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
Could someone assist me with resolving this issue?
Cristian Gutiérrez commented
Move the model to CUDA, you are probably using float 16
which is only implemented for GPUs.