[Bug] Example stops with error when running with GPU (CUDA) "Expected all tensors to be on the same device, but found at least two devices"

Question

[Bug] Example stops with error when running with GPU (CUDA) "Expected all tensors to be on the same device, but found at least two devices"

fantauzzi opened this issue a year ago · comments

Francesco Fantauzzi commented a year ago

Is this a new bug?

I believe this is a new bug
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When running the example abstractive-question-answering.ipynb I get the following error in cell 18, calling generate_answer(query)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[38], line 1
----> 1 generate_answer(query)

Cell In[37], line 5, in generate_answer(query)
      3 inputs = tokenizer([query], max_length=1024, return_tensors="pt")
      4 # use generator to predict output ids
----> 5 ids = generator.generate(inputs["input_ids"], num_beams=2, min_length=20, max_length=40)
      6 # use tokenizer to decode the output ids
      7 answer = tokenizer.batch_decode(ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/torch/utils/_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    112 @functools.wraps(func)
    113 def decorate_context(*args, **kwargs):
    114     with ctx_factory():
--> 115         return func(*args, **kwargs)

File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/transformers/generation/utils.py:1329, in GenerationMixin.generate(self, inputs, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, assistant_model, streamer, **kwargs)
   1321         logger.warning(
   1322             "A decoder-only architecture is being used, but right-padding was detected! For correct "
   1323             "generation results, please set `padding_side='left'` when initializing the tokenizer."
   1324         )
   1326 if self.config.is_encoder_decoder and "encoder_outputs" not in model_kwargs:
   1327     # if model is encoder decoder encoder_outputs are created
   1328     # and added to `model_kwargs`
-> 1329     model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(
   1330         inputs_tensor, model_kwargs, model_input_name
   1331     )
   1333 # 5. Prepare `input_ids` which will be used for auto-regressive generation
   1334 if self.config.is_encoder_decoder:

File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/transformers/generation/utils.py:642, in GenerationMixin._prepare_encoder_decoder_kwargs_for_generation(self, inputs_tensor, model_kwargs, model_input_name)
    640 encoder_kwargs["return_dict"] = True
    641 encoder_kwargs[model_input_name] = inputs_tensor
--> 642 model_kwargs["encoder_outputs"]: ModelOutput = encoder(**encoder_kwargs)
    644 return model_kwargs

File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/transformers/models/bart/modeling_bart.py:811, in BartEncoder.forward(self, input_ids, attention_mask, head_mask, inputs_embeds, output_attentions, output_hidden_states, return_dict)
    808     raise ValueError("You have to specify either input_ids or inputs_embeds")
    810 if inputs_embeds is None:
--> 811     inputs_embeds = self.embed_tokens(input_ids) * self.embed_scale
    813 embed_pos = self.embed_positions(input)
    814 embed_pos = embed_pos.to(inputs_embeds.device)

File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/torch/nn/modules/sparse.py:162, in Embedding.forward(self, input)
    161 def forward(self, input: Tensor) -> Tensor:
--> 162     return F.embedding(
    163         input, self.weight, self.padding_idx, self.max_norm,
    164         self.norm_type, self.scale_grad_by_freq, self.sparse)

File ~/.pyenv/versions/3.11.3/envs/pinecone/lib/python3.11/site-packages/torch/nn/functional.py:2210, in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   2204     # Note [embedding_renorm set_grad_enabled]
   2205     # XXX: equivalent to
   2206     # with torch.no_grad():
   2207     #   torch.embedding_renorm_
   2208     # remove once script supports set_grad_enabled
   2209     _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2210 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

Expected Behavior

Should run without errors

Steps To Reproduce

On a system with an NVIDIA GPU supported by PyTorch:

clone the repo
start jupyter lab
open and run the notebook

Relevant log output

N/A

Environment

- **OS**: Ubuntu 22.04
- **Language version**: Python 3.11.3
- **Pinecone client version**: 2.2.2

Additional Context

To fix the bug, change the following line in function generate_answer(query):
inputs = tokenizer([query], max_length=1024, return_tensors="pt")
to:
inputs = tokenizer([query], max_length=1024, return_tensors="pt").to(device)