Improve SmolGPT
bclarkson-code opened this issue · comments
SmolGPT (49M) is not very performant. It gave it the following prompt:
def add_one(x):
It completed it as follows:
def add_one(x):
return 10
I think that there are a number of issues.
First, The model could of course be bigger. With more optimised kernels, modern techniques like rotary embeddings and multi-gpu support, we will hopefully be able to train a larger model in a reasonable amount of time.
Second, the dataset can probably be improved. More evaluation is needed but I think adding some web text has the potential to make using the model easier