feat: ggml: support more parameters from llama.cpp

Question

feat: ggml: support more parameters from llama.cpp

dm4 opened this issue 2 months ago · comments

Summary

We currently support some parameters from llama.cpp, such as n_gpu_layers, cox-size, thread, etc., and we expect to support even more parameters.

Details

Refer to llama.cpp/common/common.cpp/gpt_params_find_arg(), planning to support additional parameters.

Appendix

List all options:

Dhruv Jain · Answer 1 · Sat Apr 06 2024 11:37:54 GMT+0800 (China Standard Time)

is this issue open for contributions? if yes I would love to look into this.

dm4 · Answer 2 · Sat Apr 06 2024 13:31:33 GMT+0800 (China Standard Time)

is this issue open for contributions? if yes I would love to look into this.

Yes, this issue is open for contributions. We welcome your input and any code related to this issue.

Fusaaaann · Answer 3 · Sat May 11 2024 18:58:46 GMT+0800 (China Standard Time)

some parameters, such as --parallel and --draft, are not directly used in internal implementation of llama.cpp, according to search result for "n_parallel" in llama.cpp.
only some parameters would affect internal behavior of llama.cpp functions, like parameters related to RoPE, otherwise integrating processing logics to support the additional parameters could totally change implementation of compute(), like the example below:

Abstract of integrating `--parallel` `--draft` and parsing it as an optional parameter in WasmEdge

struct Graph {
    // ...
    uint64_t NParallel = 1; 
    uint64_t NDraft = 1;
}

Expect<ErrNo> compute(WasiNNEnvironment &Env, uint32_t ContextId) noexcept {
    // ...
    // if --draft and --parallel are set
    ReturnCode = SpeculativeDecoding(GraphRef, CxtRef);
    // else use current implementation
    // ...
}

ErrNo SpeculativeDecoding(Graph &GraphRef, Context &CxtRef) noexcept {
    // implementation like https://github.com/ggerganov/llama.cpp/blob/3292733f95d4632a956890a438af5192e7031c12/examples/speculative/speculative.cpp
}

detailed code: https://github.com/Fusaaaann/WasmEdge/blob/ae718df452658df555e2b4fe35e8c90e69c5c55f/plugins/wasi_nn/strategies/strategies.cpp#L234

what is WasmEdge's future planning for supporting these parameters, if wasi-nn functions could become too complex to fit in one ggml.cpp file due to support for these parameters?

hydai · Answer 4 · Tue May 14 2024 17:39:47 GMT+0800 (China Standard Time)

Hi @Fusaaaann
We don't have a robust timeline for supporting the above parameters. If there is an application that will require such options, then we will increase the priority of them. There are already two different ways to handle normal LLM and LLaVA applications in our plugin; we don't matter if the complexity increases after adding more parameters.