CUDA implementation of ggml_clamp

Question

CUDA implementation of ggml_clamp

jploski opened this issue 8 months ago · comments

ggml_clamp currently lacks a CUDA implementation, which is rather trivial, but something I ran into while porting MPT to llama.cpp. I have a ready implementation and will create a PR for it, referencing this issue.

Jan Ploski · Answer 1 · Sun Oct 01 2023 00:03:06 GMT+0800 (China Standard Time)

Hmm, I'm a bit confused as the internals (e.g. of ggml_scale) on ggml master branch look different than on llama.cpp master. llama.cpp's implementation of ggml-cuda.cu seems more recent (and cleaner looking, too). Will ggml eventually catch up to llama.cpp? (In that case I would contribute the patch on llama.cpp instead.)

slaren · Answer 2 · Sun Oct 01 2023 02:41:15 GMT+0800 (China Standard Time)

Most of the development of the CUDA backend happens in the llama.cpp repository, and the changes are merged here regularly. Opening the PR in the llama.cpp repository is fine, the changes will eventually end here. I see that the ggml_clamp implementation is already in your MPT PR in llama.cpp, so there is no need to open another PR here.

Jan Ploski · Answer 3 · Sun Oct 01 2023 02:42:33 GMT+0800 (China Standard Time)

Most of the development of the CUDA backend happens in the llama.cpp repository, and the changes are merged here regularly. Opening the PR in the llama.cpp repository is fine, the changes will eventually end here. I see that the ggml_clamp implementation is already in your MPT PR in llama.cpp, so there is no need to open another PR here.

Thanks for clarifying - I thought it worked exactly the other way around.