LlamaEdge / LlamaEdge

The easiest & fastest way to run customized and fine-tuned LLMs locally or on the edge

Home Page:https://llamaedge.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bug: failed to run install !

navr32 opened this issue · comments

Summary

Failed to install WasmEdge

Reproduction steps

try to install with bash <(curl -sSfL 'https://code.flows.network/webhook/iwYN1SdN3AmPgR5ao5Gt/run-llm.sh')

Screenshots

image

Any logs you want to share for showing the specific issue

Downloading Plugin: wasi_nn-ggml-cuda
ERROR - Download error from urllib: HTTP Error 404: Not Found
ERROR - URL: https://github.com/WasmEdge/WasmEdge/releases/download/0.13.5/WasmEdge-plugin-wasi_nn-ggml-cuda-0.13.5-manylinux2014_x86_64.tar.gz
Failed to install WasmEdge

Model Information

anyone able to test no start of the app

Operating system information

linux manjaro latest stable.

ARCH

inux opus 6.6.19-1-MANJARO #1 SMP PREEMPT_DYNAMIC amd64

CPU Information

2x Intel(R) Xeon(R) CPU X5675 @ 3.07GHz

Memory Size

96GB

GPU Information

RTX3090

VRAM Size

24GB

The reason is that the WasmEdge installer cannot recognize this distro, so it tries the legacy release assets (manylinux2014, a.k.a. CentOS 7).
Could you please use the following command to install WasmEdge manually:

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugins wasi_nn-ggml wasmedge_rustls --dist ubuntu20.04

Ok thanks this give me :

opus% curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugins wasi_nn-ggml wasmedge_rustls --dist ubuntu20.04
Using Python: /usr/sbin/python3 
INFO    - CUDA cannot be detected via nvcc
WARNING - Experimental Option Selected: plugins
WARNING - plugins option may change later
INFO    - Compatible with current configuration
INFO    - Running Uninstaller
WARNING - SHELL variable not found. Using zsh as SHELL
INFO    - shell configuration updated
INFO    - Downloading WasmEdge
|============================================================|100.00 %INFO    - Downloaded
INFO    - Installing WasmEdge
INFO    - WasmEdge Successfully installed
INFO    - Downloading Plugin: wasi_nn-ggml-cuda
|============================================================|100.00 %INFO    - Downloaded
INFO    - Downloading Plugin: wasmedge_rustls
|============================================================|100.00 %INFO    - Downloaded
INFO    - Run:
source /home/nico/.zshrc
opus% 

Great, so it can use the ubuntu20.04 release assets.
You can edit this line to append --dist ubuntu20.04 to make it work as a workaround:
https://github.com/LlamaEdge/LlamaEdge/blob/main/run-llm.sh#L259

We are still modifying the detection part for some Linux distros.

Ok i try but give me always errors with url if add at line 259..
So i have done little search and found this work if i put it to line 322:
if curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- -v 0.13.5 --plugins wasi_nn-ggml wasmedge_rustls --dist ubuntu20.04 ; then

./run-llm.sh

[+] Installing WasmEdge with wasi-nn_ggml plugin ...

Using Python: /usr/sbin/python3 
INFO    - CUDA cannot be detected via nvcc
WARNING - Experimental Option Selected: plugins
WARNING - plugins option may change later
INFO    - Compatible with current configuration
INFO    - Running Uninstaller
WARNING - SHELL variable not found. Using zsh as SHELL
INFO    - shell configuration updated
INFO    - Downloading WasmEdge
|============================================================|100.00 %INFO    - Downloaded
INFO    - Installing WasmEdge
INFO    - WasmEdge Successfully installed
INFO    - Downloading Plugin: wasi_nn-ggml-cuda
|============================================================|100.00 %INFO    - Downloaded
INFO    - Downloading Plugin: wasmedge_rustls
|============================================================|100.00 %INFO    - Downloaded
INFO    - Run:
source /home/nico/.zshrc

    The WasmEdge Runtime is installed in /home/nico/.wasmedge/bin/wasmedge.


[+] Using cached model gemma-2b-it-Q5_K_M.gguf 
[+] Downloading the latest llama-api-server.wasm ...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 9263k  100 9263k    0     0  16.9M      0 --:--:-- --:--:-- --:--:-- 16.9M

[+] Downloading Chatbot web app ...
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 1721k  100 1721k    0     0  2869k      0 --:--:-- --:--:-- --:--:-- 2869k

[+] Will run the following command to start the server:

    wasmedge --dir .:. --nn-preload default:GGML:AUTO:gemma-2b-it-Q5_K_M.gguf llama-api-server.wasm -p gemma-instruct -c 4096 --model-name gemma-2b-it --socket-addr 0.0.0.0:8080 --log-prompts --log-stat

    Chatbot web app can be accessed at http://0.0.0.0:8080 after the server is started


*********************************** LlamaEdge API Server ********************************

./run-llm.sh : ligne 380 : 705080 Instruction non permise (core dumped)wasmedge --dir .:. --nn-preload default:GGML:AUTO:gemma-2b-it-Q5_K_M.gguf llama-api-server.wasm -p gemma-instruct -c 4096 --model-name gemma-2b-it --socket-addr 0.0.0.0:8080 --log-prompts --log-stat
`

But now core dump...i have no avx on my cpu and perhaps the problem i have ever the problem with llamacpp or other project and have to rebuild them without the avx to have them working.

The core dump on dmesg :
traps: wasmedge[705080] trap invalid opcode ip:7f03e6618910 sp:7ffc17311160 error:0 in libwasmedgePluginWasiNN.so[7f03e6439000+221000

Hi @navr32
Due to the performance issue, the pre-built version will always enable the AVX ISA. If your hardware doesn't support it, please build from source by yourself.

Yes this i have done. And this run. but now i have a problem when the model is bigger than the vram , all crash. after search since the 5XX nvidia driver the driver doesn't use shared memory and cuda is unable to use ram ..so any solution to run model bigger than 24g on the gpu with llamaedge. Thanks for the support , reply and so and so. very Good project.

You can set the -g, --n-gpu-layers <N_GPU_LAYERS> to a smaller value.
It's the number of layers to run on the GPU [default: 100].

Ref: https://github.com/LlamaEdge/LlamaEdge/tree/main/api-server#cli-options-for-the-api-server