CUDA implementation of ggml_clamp
jploski opened this issue · comments
ggml_clamp currently lacks a CUDA implementation, which is rather trivial, but something I ran into while porting MPT to llama.cpp. I have a ready implementation and will create a PR for it, referencing this issue.
Hmm, I'm a bit confused as the internals (e.g. of ggml_scale) on ggml master branch look different than on llama.cpp master. llama.cpp's implementation of ggml-cuda.cu seems more recent (and cleaner looking, too). Will ggml eventually catch up to llama.cpp? (In that case I would contribute the patch on llama.cpp instead.)
Most of the development of the CUDA backend happens in the llama.cpp repository, and the changes are merged here regularly. Opening the PR in the llama.cpp repository is fine, the changes will eventually end here. I see that the ggml_clamp implementation is already in your MPT PR in llama.cpp, so there is no need to open another PR here.
Most of the development of the CUDA backend happens in the llama.cpp repository, and the changes are merged here regularly. Opening the PR in the llama.cpp repository is fine, the changes will eventually end here. I see that the ggml_clamp implementation is already in your MPT PR in llama.cpp, so there is no need to open another PR here.
Thanks for clarifying - I thought it worked exactly the other way around.