Autoregressivity

Question

Autoregressivity

sdpmas opened this issue 2 years ago · comments

I had a question about Figure 2 and equation 3 from the paper. How does the last token of each chunk C_u being able to attend to the retrieved content E_u not break autoregressivity?

Phil Wang · Answer 1 · Wed Feb 09 2022 06:30:23 GMT+0800 (China Standard Time)

so basically you have to make sure past tokens never see a future token. the last token is the most far future token, it can safely attend to all of E_u without violating that rule

Phil Wang · Answer 2 · Wed Feb 09 2022 06:44:14 GMT+0800 (China Standard Time)

@sdpmas the same trick was actually used here https://arxiv.org/abs/2110.13711 (i think deepmind probably read this paper and got some inspiration tbh)

Samip Dahal · Answer 3 · Wed Feb 09 2022 07:23:34 GMT+0800 (China Standard Time)

I see, thanks a lot for the explanation!