About pull request of custom kernel implementation
KarhouTam opened this issue · comments
Hey. First of all, I really appreciate this repo, it helps me learn cuda programming a lot.
I tried to implement a softmax forward kernel myself and mixed some optimization skills that I learned and found that it actually works well on my GPU (3070 laptop).
Does this repo accept PR for custom implementation of existing kernel?