mphantom / mnn-llm

llm deploy project based mnn.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

mnn-llm

mnn-llm

License Download

Read me in english

模型支持

llm模型导出onnx模型请使用llm-export

当前支持以模型:

model onnx-fp32 mnn-int4
chatglm-6b Download Download
chatglm2-6b Download Download
codegeex2-6b Download Download
Qwen-7B-Chat Download Download
Baichuan2-7B-Chat Download Download
Llama-2-7b-chat Download Download

下载int4模型

# <model> like `chatglm-6b`
# linux/macos
./script/download_model.sh <model>

# windows
./script/download_model.ps1 <model>

构建

当前构建状态:

System Build Statud
Linux Build Status
Macos Build Status
Windows Build Status
Android Build Status

本地编译

# linux
./script/linux_build.sh

# macos
./script/macos_build.sh

# windows msvc
./script/windows_build.ps1

# android
./script/android_build.sh

默认使用CPU后端,如果使用其他后端,可以在脚本中添加MNN编译宏

  • cuda: -DMNN_CUDA=ON
  • opencl: -DMNN_OPENCL=ON

4. 执行

# linux/macos
./cli_demo # cli demo
./web_demo # web ui demo

# windows
.\Debug\cli_demo.exe
.\Debug\web_demo.exe

# android
adb push libs/*.so build/libllm.so build/cli_demo /data/local/tmp
adb push model_dir /data/local/tmp
adb shell "cd /data/local/tmp && export LD_LIBRARY_PATH=. && ./cli_demo -m model"

Reference

About

llm deploy project based mnn.

License:Apache License 2.0


Languages

Language:C++ 74.4%Language:HTML 9.2%Language:Java 8.3%Language:JavaScript 3.2%Language:Dockerfile 2.2%Language:PowerShell 1.2%Language:CMake 0.8%Language:Shell 0.7%Language:CSS 0.1%