How to visualize attention if the sizes of input and output sequence are different? ValueError.

Question

How to visualize attention if the sizes of input and output sequence are different? ValueError.

sn0rkmaiden opened this issue a year ago · comments

I have a custom pretrained T5 model that is predicting the solution to quadratic equations, so it's output is of different size than input (in all the examples I saw that they are the same). I'm trying to visualize attention like this:

tokenizer = AutoTokenizer.from_pretrained("my-repo/content")
model = AutoModelForSeq2SeqLM.from_pretrained("my-repo/content", output_attentions=True)

encoder_input_ids = tokenizer("7*x^2+3556*x+451612=0", return_tensors="pt", add_special_tokens=True).input_ids

outputs = model.generate(inputs.input_ids, attention_mask=inputs.attention_mask, max_length=80, min_length=10, output_attentions=True, return_dict_in_generate=True)

For example predicted sequence is: "D = 3556 ^ 2 - 4 * 7 * 4 5 1 6 1 2 = 2 1 ; x 1 = ( - 3556 + ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0 ; x 2 = ( - 3556 - ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0".

with tokenizer.as_target_tokenizer(): decoder_input_ids = tokenizer("D = 3556 ^ 2 - 4 * 7 * 4 5 1 6 1 2 = 2 1 ; x 1 = ( - 3556 + ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0 ; x 2 = ( - 3556 - ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0", return_tensors="pt", add_special_tokens=True).input_ids

encoder_text = tokenizer.convert_ids_to_tokens(encoder_input_ids[0])
decoder_text = tokenizer.convert_ids_to_tokens(decoder_input_ids[0])

So encoder_text length is 18, decoder_text length is 79.

For some reason when I get all attentions from the outputs they come in a form of tuple (cross attention is even a tuple of tuples).
I can't seem to figure out how to use this function correctly and why my attentions are of wrong dimensions.

model_view( cross_attention = outputs.cross_attentions, encoder_attention = encoder_attention, decoder_attention = decoder_attention, encoder_tokens = encoder_text, decoder_tokens = decoder_text)

Is the problem because the output length is different from the input?

Thomas Kelly · Answer 1 · Fri Aug 11 2023 00:12:21 GMT+0800 (China Standard Time)

Using generate returns the auto-regressive attentions. Try using teacher forcing (ie providing labels/decoder_ids) through the forward pass