Precompute the scaling factor in gelu_forward and gelu_backward

Question

Precompute the scaling factor in gelu_forward and gelu_backward

ryanmcdermott opened this issue 2 years ago · comments

Thank you so much for the amazing repo!

The scaling factor in gelu_forward and gelu_backward is computed each time the functions are run: float s = sqrtf(2.0f / M_PI);

This can be precomputed ahead of time and stored as a constant.

Andrej commented 2 years ago

fixed

Ayush Anshul · Answer 1 · Thu Apr 11 2024 15:43:55 GMT+0800 (China Standard Time)

Is this good enough?

Andy Lo · Answer 2 · Thu Apr 11 2024 20:05:56 GMT+0800 (China Standard Time)

Doesn't it get optimised away by the compiler anyway? (Haven't actually checked though) Plus pointwise operations are bandwidth-limited anyway, so adding/removing a few flops shouldn't make a difference.

Ayush Anshul · Answer 3 · Fri Apr 12 2024 01:51:54 GMT+0800 (China Standard Time)

@ryanmcdermott Isn't #defining constant is just replacement and the actual calculation is still repeated?
f26cf00