The tinygrad version is very slow compared to pytorch probably because I am missing some detail about tinygrad. Even though it uses CUDA, seems like there is a chance for optimization here.
Torch and tinygrad implementation of Karpathy's nanogpt based off of shakespeare's text
The tinygrad version is very slow compared to pytorch probably because I am missing some detail about tinygrad. Even though it uses CUDA, seems like there is a chance for optimization here.
Torch and tinygrad implementation of Karpathy's nanogpt based off of shakespeare's text