karpathy / llm.c

LLM training in simple, raw C/CUDA

Repository from Github https://github.comkarpathy/llm.cRepository from Github https://github.comkarpathy/llm.c

Suggestion: Test more Activation Functions

linux-leo opened this issue · comments

Some of my favourites:

  • mish
  • serf (mish but with tanh replaced by erf)
  • TanhExp ( x*tanh(e^x) )

EDIT: Maybe try activation functions based on the sine function?, e.g SinLU, or easy to compute ReLU variants, e.g ReLU Squared, StarRelu