question: Possibility of simplifying running AI model-specific wasm Workloads with Embedded Configuration
sohankunkerkar opened this issue · comments
Summary
I have been using WasmEdge with the llama2 model, and it's working great with the following command:
$ wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf llama-chat.wasm
In a containerized environment where WasmEdge is integrated into a lower-level runtime called crun-wasm
, I would like to simplify running the llama2 model. The goal is to embed configuration settings directly into the wasm workload, allowing for a more straightforward execution in the containerized environment using crun-wasm.
Is it possible to configure the llama-chat.wasm workload in a way that essential settings (such as --dir
and --nn-preload
) can be embedded directly into the wasm file. This would enable us to run the model like:
$ ./llama-chat.wasm
Please provide guidance or suggestions on achieving a more streamlined execution of AI-integrated wasm workloads in a containerized environment with wasmedge integrated into a lower-level runtime.
Appendix
No response
We have been trying very hard to get WasmEdge GGML plugin working with crun-wasm. It is high on our list.
A sticking point is that crun does not detect the correct CUDA version on the host machine. We are not sure why. @hydai perhaps can shed more light on this?
@juntao Thanks for the reply. Let me know if you need some traction on the crun-specific issue.
Hi @CaptainVincent
We would like to have an issue tracking the crun+WasmEdge+ggml plugin. Could you please raise a new issue and talk about the current status of the run-related integration?
@sohankunkerkar You mention "simplify" running llama in containerized environment, could you please guide me to some resource with steps to run llama2 with crun wasm environment which works currently .
https://wasmedge.org/docs/start/build-and-run/docker_wasm_gpu
https://wasmedge.org/docs/start/build-and-run/podman_wasm_gpu
I have two documents that you might find useful.
@shiveshcodes, I believe the links provided by @CaptainVincent would be helpful for your needs. If you're interested in the cri-o + crun workflow, you can also check out this link: https://github.com/sohankunkerkar/wasm-kubecon-demos/tree/main.