karpathy / llm.c

LLM training in simple, raw C/CUDA

Repository from Github https://github.comkarpathy/llm.cRepository from Github https://github.comkarpathy/llm.c

Some things that might be optimizations

Swight1423 opened this issue · comments

I noticed these potential optimizations early in my reading (full reading not yet complete) of train_gpt2.c but have not performed benchmarks(I primarily program in another language). 1. in some of the loops where you are working with multidimensional arrays you do index calculations in the innermost loop. I think some of these calculations could be split up between the loops because they are mostly involve just the currently modified index and values that are effectively constants.2. in the matmul forward functions there is a inner loop that assigns zero if the bias is null. I think this conditional could be moved out of this loop or even this function as this value is not modified in the function and replaced with a array that contains the result of this calculation and the values in this buffer could be assigned to the result array at the same place as the conditional originally existed. if it was moved outside of the function the buffer might be able to be shared between all the matmul forward calls if it was made big enough to hold the biggest size needed which appears to be a constant. Think the code would already restrict the range appropriately within the matmul forward calls so you could just overwrite the same buffer between calls if desired and not detrimental elsewhere.