WasmEdge / WasmEdge

WasmEdge is a lightweight, high-performance, and extensible WebAssembly runtime for cloud native, edge, and decentralized applications. It powers serverless apps, embedded functions, microservices, smart contracts, and IoT devices.

Home Page:https://WasmEdge.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

question: Possibility of simplifying running AI model-specific wasm Workloads with Embedded Configuration

sohankunkerkar opened this issue · comments

Summary

I have been using WasmEdge with the llama2 model, and it's working great with the following command:

$ wasmedge --dir .:. --nn-preload default:GGML:AUTO:llama-2-7b-chat-q5_k_m.gguf llama-chat.wasm

In a containerized environment where WasmEdge is integrated into a lower-level runtime called crun-wasm, I would like to simplify running the llama2 model. The goal is to embed configuration settings directly into the wasm workload, allowing for a more straightforward execution in the containerized environment using crun-wasm.

Is it possible to configure the llama-chat.wasm workload in a way that essential settings (such as --dir and --nn-preload) can be embedded directly into the wasm file. This would enable us to run the model like:

$ ./llama-chat.wasm

Please provide guidance or suggestions on achieving a more streamlined execution of AI-integrated wasm workloads in a containerized environment with wasmedge integrated into a lower-level runtime.

Appendix

No response

Hi @sohankunkerkar

We have been trying very hard to get WasmEdge GGML plugin working with crun-wasm. It is high on our list.

A sticking point is that crun does not detect the correct CUDA version on the host machine. We are not sure why. @hydai perhaps can shed more light on this?

@juntao Thanks for the reply. Let me know if you need some traction on the crun-specific issue.

Hi @CaptainVincent
We would like to have an issue tracking the crun+WasmEdge+ggml plugin. Could you please raise a new issue and talk about the current status of the run-related integration?

#3217 has been added.

@sohankunkerkar You mention "simplify" running llama in containerized environment, could you please guide me to some resource with steps to run llama2 with crun wasm environment which works currently .

@shiveshcodes, I believe the links provided by @CaptainVincent would be helpful for your needs. If you're interested in the cri-o + crun workflow, you can also check out this link: https://github.com/sohankunkerkar/wasm-kubecon-demos/tree/main.