LFX Workspace: Integrate Intel Extension for Transformers as a new WASI-NN backend

Question

LFX Workspace: Integrate Intel Extension for Transformers as a new WASI-NN backend

grorge123 opened this issue 3 months ago · comments

Summary

Motivation

With the rise of Large Language Models (LLMs), there's an increasing demand for running LLM models on CPUs. The Intel Extension for Transformers is a toolkit that can accelerate LLM models on various Intel platforms. The goal of this task is to integrate the Intel Extension for Transformers into the WASI-NN backend to enhance the performance of running LLMs on CPUs.

Details

Milestones:

Understanding WASI-NN and Intel Extension for Transformers (week 2 ~ 3)
- Study how existing WASI-NN supports GGML (https://github.com/second-state/wasmedge-wasi-nn/blob/ggml/rust/src/lib.rs)
- Research how Wasmedge connects to GGML (https://github.com/WasmEdge/WasmEdge/blob/master/plugins/wasi_nn/ggml.cpp)
- Study the Intel Extension for Transformers (https://github.com/intel/intel-extension-for-transformers)
Simple unit testing (week 4)
- Conduct simple runtime tests
Implementing the Plugin Part (week 4~8)
- Research how to compile the Intel Extension for Transformers into a library file
- Add a WasmEdge Intel Extension for Transformers plugin
- Add WASI-NN support for the Intel Extension for Transformers
- Add an new function to install.sh script for package installation
Complete Unit Testing and Create Sample Tutorials (week 9 ~ week 10)
- Complete all unit tests
- Complete sample tutorials
Documentation: (week 11 ~ week 12)
- Complete the documents.

Appendix

Han-Wen Tsao · Answer 1 · Wed Mar 13 2024 20:38:10 GMT+0800 (China Standard Time)

Week 1 update

Using neural-speed repo to replace Intel Extension for Transformers to implement model calculate part.
Spent time understanding wasi-nn code in wasmedge and wasmedge-wasi-nn

Han-Wen Tsao · Answer 2 · Fri Mar 22 2024 00:52:14 GMT+0800 (China Standard Time)

Week 2 update

Successfully embed neural speed python code on C++ program
Create new backend struct in wasmedge-wasi-nn

Han-Wen Tsao · Answer 3 · Fri Mar 29 2024 14:24:40 GMT+0800 (China Standard Time)

Week 3 update

Spent time understanding Wasmedge Wasi-NN test process.
Built a small unit test for neural speed backend.

Han-Wen Tsao · Answer 4 · Sat Apr 06 2024 15:21:06 GMT+0800 (China Standard Time)

Week 4 update

Spent time tracing Wasmedge GGML implement.
Add model download script.

Han-Wen Tsao · Answer 5 · Thu Apr 11 2024 19:33:31 GMT+0800 (China Standard Time)

Week 5 update

Finish implement neural speed backend using Python interpreter.

Han-Wen Tsao · Answer 6 · Fri Apr 19 2024 00:05:32 GMT+0800 (China Standard Time)

Week 6 update

Finish wasi-nn implement
Finish easy example

Han-Wen Tsao · Answer 7 · Fri Apr 26 2024 14:06:06 GMT+0800 (China Standard Time)

Week 7 update

Add install tutorial
Improve neural speed backend

Han-Wen Tsao · Answer 8 · Fri May 03 2024 13:00:03 GMT+0800 (China Standard Time)

Week 8 update

Improve neural speed backend
Add simple benchmark between neural speed and llama.cpp

Han-Wen Tsao · Answer 9 · Thu May 09 2024 19:37:40 GMT+0800 (China Standard Time)

Week 9 update

Improve neural speed backend
Fix compile error on Windows platform