Error when getting summary of T5 model: Expected tensor for argument #1 'indices' to have one of the following scalar types
anentropic opened this issue · comments
Hi,
I am trying to summarise a model from HuggingFace Hub
from transformers import T5ForConditionalGeneration, T5Tokenizer, T5Config
config = T5Config.from_pretrained('t5-large')
input_shape = (1, config.max_length)
model = T5ForConditionalGeneration.from_pretrained('t5-large')
summary = torchinfo.summary(model, input_shape, device="cpu")
I get this error:
site-packages/transformers/models/t5/modeling_t5.py:973, in T5Stack.forward(self, input_ids, attention_mask, encoder_hidden_states, encoder_attention_mask, inputs_embeds, head_mask, cross_attn_head_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
972 assert self.embed_tokens is not None, "You have to initialize the model with valid token embeddings"
--> 973 inputs_embeds = self.embed_tokens(input_ids)
975 batch_size, seq_length = input_shape
site-packages/torch/nn/modules/module.py:1538, in Module._call_impl(self, *args, **kwargs)
1536 args = bw_hook.setup_input_hook(args)
-> 1538 result = forward_call(*args, **kwargs)
1539 if _global_forward_hooks or self._forward_hooks:
site-packages/torch/nn/modules/sparse.py:162, in Embedding.forward(self, input)
161 def forward(self, input: Tensor) -> Tensor:
--> 162 return F.embedding(
163 input, self.weight, self.padding_idx, self.max_norm,
164 self.norm_type, self.scale_grad_by_freq, self.sparse)
site-packages/torch/nn/functional.py:2210, in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
2209 _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2210 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
Cell In[81], line 1
----> 1 summary = torchinfo.summary(model, input_shape, device="cpu")
site-packages/torchinfo/torchinfo.py:218, in summary(model, input_size, input_data, batch_dim, cache_forward_pass, col_names, col_width, depth, device, dtypes, mode, row_settings, verbose, **kwargs)
211 validate_user_params(
212 input_data, input_size, columns, col_width, device, dtypes, verbose
213 )
215 x, correct_input_size = process_input(
216 input_data, input_size, batch_dim, device, dtypes
217 )
--> 218 summary_list = forward_pass(
219 model, x, batch_dim, cache_forward_pass, device, model_mode, **kwargs
220 )
221 formatting = FormattingOptions(depth, verbose, columns, col_width, rows)
222 results = ModelStatistics(
223 summary_list, correct_input_size, get_total_memory_used(x), formatting
224 )
Is it just that some types of model are unsupported? Or I'm doing something wrong?
Thanks :)
The relevant part of that error message is this:
RuntimeError: Expected tensor for argument #1 'indices' to have one of the following scalar types: Long, Int; but got torch.FloatTensor instead (while checking arguments for embedding)
When you give input_shape
to summary
, it produces a Tensor
of type float
(I think torch.float64
, but I'm not sure). If your model needs inputs with another data-type, try using input_data
instead.
For example:
import torch
from transformers import T5ForConditionalGeneration, T5Tokenizer, T5Config
config = T5Config.from_pretrained('t5-large')
input_data = torch.ones(1, config.max_length, dtype=torch.int) # Actively define inputs, instead of just their shapes
model = T5ForConditionalGeneration.from_pretrained('t5-large')
summary = torchinfo.summary(model, input_data=input_data, device="cpu")
Try this (you might have to change the datype to torch.int8
or something, if this does not work) and if it works, report back and close the issue :)
Thanks for your help!
I get further this time:
site-packages/transformers/models/t5/modeling_t5.py:969, in T5Stack.forward(self, input_ids, attention_mask, encoder_hidden_states, encoder_attention_mask, inputs_embeds, head_mask, cross_attn_head_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
968 err_msg_prefix = "decoder_" if self.is_decoder else ""
--> 969 raise ValueError(f"You have to specify either {err_msg_prefix}input_ids or {err_msg_prefix}inputs_embeds")
971 if inputs_embeds is None:
ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
Cell In[9], line 1
----> 1 summary = torchinfo.summary(model, input_data=input_data, device="cpu")
...
RuntimeError: Failed to run torchinfo. See above stack traces for more details. Executed layers up to: [T5Stack: 1, Embedding: 2, Dropout: 2, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Embedding: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5Block: 3, T5LayerSelfAttention: 5, T5LayerNorm: 6, T5Attention: 6, Linear: 7, Linear: 7, Linear: 7, Linear: 7, Dropout: 6, T5LayerFF: 5, T5LayerNorm: 6, T5DenseActDense: 6, Linear: 7, ReLU: 7, Dropout: 7, Linear: 7, Dropout: 6, T5LayerNorm: 2, Dropout: 2]
I think I need a more fleshed out input_data
https://stackoverflow.com/questions/65140400/valueerror-you-have-to-specify-either-decoder-input-ids-or-decoder-inputs-embed
something like (input_ids, attention_mask, decoder_input_ids )
I guess our current shape is just the input_ids
part?
Or it's related to T5 being a seq2seq model https://stackoverflow.com/a/66117248/202168
This worked:
summary = torchinfo.summary(model, input_data=(input_data, input_data, input_data), device="cpu")
thanks again for your help!