bclarkson-code / Tricycle

Deep learning framework completely from scratch in python + numpy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Improve SmolGPT

bclarkson-code opened this issue · comments

SmolGPT (49M) is not very performant. It gave it the following prompt:

def add_one(x):

It completed it as follows:

def add_one(x):
    return 10

I think that there are a number of issues.

First, The model could of course be bigger. With more optimised kernels, modern techniques like rotary embeddings and multi-gpu support, we will hopefully be able to train a larger model in a reasonable amount of time.

Second, the dataset can probably be improved. More evaluation is needed but I think adding some web text has the potential to make using the model easier