jessevig / bertviz

BertViz: Visualize Attention in NLP Models (BERT, GPT2, BART, etc.)

Home Page:https://towardsdatascience.com/deconstructing-bert-part-2-visualizing-the-inner-workings-of-attention-60a16d86b5c1

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to visualize attention if the sizes of input and output sequence are different? ValueError.

sn0rkmaiden opened this issue · comments

I have a custom pretrained T5 model that is predicting the solution to quadratic equations, so it's output is of different size than input (in all the examples I saw that they are the same). I'm trying to visualize attention like this:

tokenizer = AutoTokenizer.from_pretrained("my-repo/content")
model = AutoModelForSeq2SeqLM.from_pretrained("my-repo/content", output_attentions=True)

encoder_input_ids = tokenizer("7*x^2+3556*x+451612=0", return_tensors="pt", add_special_tokens=True).input_ids

outputs = model.generate(inputs.input_ids, attention_mask=inputs.attention_mask, max_length=80, min_length=10, output_attentions=True, return_dict_in_generate=True)

For example predicted sequence is: "D = 3556 ^ 2 - 4 * 7 * 4 5 1 6 1 2 = 2 1 ; x 1 = ( - 3556 + ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0 ; x 2 = ( - 3556 - ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0".

with tokenizer.as_target_tokenizer(): decoder_input_ids = tokenizer("D = 3556 ^ 2 - 4 * 7 * 4 5 1 6 1 2 = 2 1 ; x 1 = ( - 3556 + ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0 ; x 2 = ( - 3556 - ( 2 1 ) * * 0. 5 ) / / ( 2 * 7 ) = - 2. 0", return_tensors="pt", add_special_tokens=True).input_ids

encoder_text = tokenizer.convert_ids_to_tokens(encoder_input_ids[0])
decoder_text = tokenizer.convert_ids_to_tokens(decoder_input_ids[0])

So encoder_text length is 18, decoder_text length is 79.

For some reason when I get all attentions from the outputs they come in a form of tuple (cross attention is even a tuple of tuples).
I can't seem to figure out how to use this function correctly and why my attentions are of wrong dimensions.

model_view( cross_attention = outputs.cross_attentions, encoder_attention = encoder_attention, decoder_attention = decoder_attention, encoder_tokens = encoder_text, decoder_tokens = decoder_text)

Is the problem because the output length is different from the input?

Using generate returns the auto-regressive attentions. Try using teacher forcing (ie providing labels/decoder_ids) through the forward pass