Precompute the scaling factor in gelu_forward and gelu_backward
ryanmcdermott opened this issue · comments
Thank you so much for the amazing repo!
The scaling factor in gelu_forward and gelu_backward is computed each time the functions are run: float s = sqrtf(2.0f / M_PI);
This can be precomputed ahead of time and stored as a constant.
Is this good enough?
Doesn't it get optimised away by the compiler anyway? (Haven't actually checked though) Plus pointwise operations are bandwidth-limited anyway, so adding/removing a few flops shouldn't make a difference.
fixed
@ryanmcdermott Isn't #defining constant is just replacement and the actual calculation is still repeated?
f26cf00