feat: Support `--tensor-split` in the ggml plugin
hydai opened this issue · comments
Summary
When running a large MoE model, the large tensors should be split across into multiple GPUs. Especially when we have multiple different GPUs with various VRAM sizes, this feature helps.
Details
- Support
--tensor-split
.
Appendix
No response