Add embedding layer
bclarkson-code opened this issue · comments
In the current GPT implementation, embedding tokens is done by one-hot encoding them and passing them through a linear layer. Because the tokens are one-hot encoded, we don't actually need to do the full matrix multiplication and can instead do a dictionary lookup.
We can encapsulate this in an Embedding layer which should significantly reduce both memory usage and computational load