Add self attention encoder
Adamits opened this issue · comments
With the decoupling of encoders and decoders, we have added a Linear
encoder, which seems to just embed the inputs and pass them along. We should also add a SelfAttention
encoder, which encodes the embeddings with a self attention layer (and no positional encoding).
This contextualizes the embeddings by representing each as a linear combination of itself wrt all other embeddings.
+1. Makes sense.