bclarkson-code / Tricycle

Deep learning framework completely from scratch in python + numpy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add embedding layer

bclarkson-code opened this issue · comments

In the current GPT implementation, embedding tokens is done by one-hot encoding them and passing them through a linear layer. Because the tokens are one-hot encoded, we don't actually need to do the full matrix multiplication and can instead do a dictionary lookup.

We can encapsulate this in an Embedding layer which should significantly reduce both memory usage and computational load