question: Possibility of simplifying running AI model-specific wasm Workloads with Embedded Configuration

Question

question: Possibility of simplifying running AI model-specific wasm Workloads with Embedded Configuration

sohankunkerkar opened this issue 5 months ago · comments

Summary

I have been using WasmEdge with the llama2 model, and it's working great with the following command:

$ wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf llama-chat.wasm

In a containerized environment where WasmEdge is integrated into a lower-level runtime called crun-wasm, I would like to simplify running the llama2 model. The goal is to embed configuration settings directly into the wasm workload, allowing for a more straightforward execution in the containerized environment using crun-wasm.

Is it possible to configure the llama-chat.wasm workload in a way that essential settings (such as --dir and --nn-preload) can be embedded directly into the wasm file. This would enable us to run the model like:

$ ./llama-chat.wasm

Please provide guidance or suggestions on achieving a more streamlined execution of AI-integrated wasm workloads in a containerized environment with wasmedge integrated into a lower-level runtime.

Appendix

No response

Michael Yuan · Answer 1 · Wed Feb 14 2024 11:50:07 GMT+0800 (China Standard Time)

Hi @sohankunkerkar

We have been trying very hard to get WasmEdge GGML plugin working with crun-wasm. It is high on our list.

A sticking point is that crun does not detect the correct CUDA version on the host machine. We are not sure why. @hydai perhaps can shed more light on this?

Sohan Kunkerkar · Answer 2 · Wed Feb 14 2024 12:57:45 GMT+0800 (China Standard Time)

@juntao Thanks for the reply. Let me know if you need some traction on the crun-specific issue.

hydai · Answer 3 · Wed Feb 14 2024 13:39:14 GMT+0800 (China Standard Time)

Hi @CaptainVincent
We would like to have an issue tracking the crun+WasmEdge+ggml plugin. Could you please raise a new issue and talk about the current status of the run-related integration?

vincent · Answer 4 · Thu Feb 15 2024 15:56:21 GMT+0800 (China Standard Time)

#3217 has been added.

Shivesh Gupta · Answer 5 · Mon May 27 2024 17:06:10 GMT+0800 (China Standard Time)

@sohankunkerkar You mention "simplify" running llama in containerized environment, could you please guide me to some resource with steps to run llama2 with crun wasm environment which works currently .

vincent · Answer 6 · Tue Jun 04 2024 13:52:54 GMT+0800 (China Standard Time)

https://wasmedge.org/docs/start/build-and-run/docker_wasm_gpu
https://wasmedge.org/docs/start/build-and-run/podman_wasm_gpu
I have two documents that you might find useful.

Sohan Kunkerkar · Answer 7 · Wed Jun 05 2024 01:07:20 GMT+0800 (China Standard Time)

@shiveshcodes, I believe the links provided by @CaptainVincent would be helpful for your needs. If you're interested in the cri-o + crun workflow, you can also check out this link: https://github.com/sohankunkerkar/wasm-kubecon-demos/tree/main.