LFX Workspace: Integrate Intel Extension for Transformers as a new WASI-NN backend
grorge123 opened this issue · comments
Summary
Motivation
With the rise of Large Language Models (LLMs), there's an increasing demand for running LLM models on CPUs. The Intel Extension for Transformers is a toolkit that can accelerate LLM models on various Intel platforms. The goal of this task is to integrate the Intel Extension for Transformers into the WASI-NN backend to enhance the performance of running LLMs on CPUs.
Details
Milestones:
- Understanding WASI-NN and Intel Extension for Transformers (week 2 ~ 3)
- Study how existing WASI-NN supports GGML (https://github.com/second-state/wasmedge-wasi-nn/blob/ggml/rust/src/lib.rs)
- Research how Wasmedge connects to GGML (https://github.com/WasmEdge/WasmEdge/blob/master/plugins/wasi_nn/ggml.cpp)
- Study the Intel Extension for Transformers (https://github.com/intel/intel-extension-for-transformers)
- Simple unit testing (week 4)
- Conduct simple runtime tests
- Implementing the Plugin Part (week 4~8)
- Research how to compile the Intel Extension for Transformers into a library file
- Add a WasmEdge Intel Extension for Transformers plugin
- Add WASI-NN support for the Intel Extension for Transformers
- Add an new function to install.sh script for package installation
- Complete Unit Testing and Create Sample Tutorials (week 9 ~ week 10)
- Complete all unit tests
- Complete sample tutorials
- Documentation: (week 11 ~ week 12)
- Complete the documents.
Appendix
Week 1 update
- Using neural-speed repo to replace Intel Extension for Transformers to implement model calculate part.
- Spent time understanding wasi-nn code in wasmedge and wasmedge-wasi-nn
Week 2 update
- Successfully embed neural speed python code on C++ program
- Create new backend struct in wasmedge-wasi-nn
Week 3 update
- Spent time understanding Wasmedge Wasi-NN test process.
- Built a small unit test for neural speed backend.
Week 4 update
- Spent time tracing Wasmedge GGML implement.
- Add model download script.
Week 5 update
- Finish implement neural speed backend using Python interpreter.
Week 6 update
- Finish wasi-nn implement
- Finish easy example
Week 7 update
- Add install tutorial
- Improve neural speed backend
Week 8 update
- Improve neural speed backend
- Add simple benchmark between neural speed and llama.cpp
Week 9 update
- Improve neural speed backend
- Fix compile error on Windows platform