WasmEdge / WasmEdge

WasmEdge is a lightweight, high-performance, and extensible WebAssembly runtime for cloud native, edge, and decentralized applications. It powers serverless apps, embedded functions, microservices, smart contracts, and IoT devices.

Home Page:https://WasmEdge.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

feat: Support `--tensor-split` in the ggml plugin

hydai opened this issue · comments

Summary

When running a large MoE model, the large tensors should be split across into multiple GPUs. Especially when we have multiple different GPUs with various VRAM sizes, this feature helps.

Details

  • Support --tensor-split.

Appendix

No response