WasmEdge / WasmEdge

WasmEdge is a lightweight, high-performance, and extensible WebAssembly runtime for cloud native, edge, and decentralized applications. It powers serverless apps, embedded functions, microservices, smart contracts, and IoT devices.

Home Page:https://WasmEdge.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LFX Workspace: Integrate Intel Extension for Transformers as a new WASI-NN backend

grorge123 opened this issue · comments

Summary

Motivation

With the rise of Large Language Models (LLMs), there's an increasing demand for running LLM models on CPUs. The Intel Extension for Transformers is a toolkit that can accelerate LLM models on various Intel platforms. The goal of this task is to integrate the Intel Extension for Transformers into the WASI-NN backend to enhance the performance of running LLMs on CPUs.

Details

Milestones:

  1. Understanding WASI-NN and Intel Extension for Transformers (week 2 ~ 3)
  2. Simple unit testing (week 4)
    • Conduct simple runtime tests
  3. Implementing the Plugin Part (week 4~8)
    • Research how to compile the Intel Extension for Transformers into a library file
    • Add a WasmEdge Intel Extension for Transformers plugin
    • Add WASI-NN support for the Intel Extension for Transformers
    • Add an new function to install.sh script for package installation
  4. Complete Unit Testing and Create Sample Tutorials (week 9 ~ week 10)
    • Complete all unit tests
    • Complete sample tutorials
  5. Documentation: (week 11 ~ week 12)
    • Complete the documents.

Appendix

Week 1 update

  • Using neural-speed repo to replace Intel Extension for Transformers to implement model calculate part.
  • Spent time understanding wasi-nn code in wasmedge and wasmedge-wasi-nn

Week 2 update

  • Successfully embed neural speed python code on C++ program
  • Create new backend struct in wasmedge-wasi-nn

Week 3 update

  • Spent time understanding Wasmedge Wasi-NN test process.
  • Built a small unit test for neural speed backend.

Week 4 update

  • Spent time tracing Wasmedge GGML implement.
  • Add model download script.

Week 5 update

  • Finish implement neural speed backend using Python interpreter.

Week 6 update

  • Finish wasi-nn implement
  • Finish easy example

Week 7 update

  • Add install tutorial
  • Improve neural speed backend

Week 8 update

  • Improve neural speed backend
  • Add simple benchmark between neural speed and llama.cpp

Week 9 update

  • Improve neural speed backend
  • Fix compile error on Windows platform