Add embedding layer

Question

Add embedding layer

bclarkson-code opened this issue 3 months ago · comments

In the current GPT implementation, embedding tokens is done by one-hot encoding them and passing them through a linear layer. Because the tokens are one-hot encoded, we don't actually need to do the full matrix multiplication and can instead do a dictionary lookup.

We can encapsulate this in an Embedding layer which should significantly reduce both memory usage and computational load