mlc-ai / web-llm

High-performance In-browser LLM Inference Engine

Home Page:https://webllm.mlc.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

try to run gemma-7b but failed

mayneyao opened this issue · comments

Recently, I integrated webllm into my web project, and the effect of gemma-2b is pretty good. Thanks for your work, everything runs very well. I am trying to add more powerful models.

I noticed that there is a quantized version of gemma-7b on huggingface but there are no related libs in https://github.com/mlc-ai/binary-mlc-llm-libs.

I tried to compile the wasm of gemma-7b according to the documentation, and then found that there was an error when loading the model to 8?/101

Loading model from cache[82/101]: 2935MB loaded. 51% completed, 5 secs elapsed.

I saw "Here" in the console, and followed the code to find it here.

https://github.com/apache/tvm/blob/657880cdcedd7e41e911c583a8e93b3053a6ad27/web/src/runtime.ts#L82

Here is my configuration

  {
    model_url: "http://localhost:5173/webllm/files/gemma-7b-it-q4f16_2-MLC/",
    local_id: "gemma-7b-it-q4f16_2",
    model_lib_url:
      "https://raw.githubusercontent.com/mayneyao/binary-mlc-llm-libs/main/gemma-7b-it/gemma-7b-it-q4f16_2-MLC-webgpu.wasm",
  },


I successfully compiled gemma-7b-it-q4f16_2-metal.so following the instructions here: https://github.com/mlc-ai/notebooks/blob/main/mlc-llm/models/demo_gemma.ipynb. It runs successfully on my MBP. It seems that the bug only appears in wasm.

Has anyone successfully run gemma-7b? or any suggestions on how to troubleshoot this issue?

I have the same problem. It seems like the mlc_chat compile ... --device webgpudoes not work for gemma 7b :/
I do get the following error in the console:

Here
worker.ts:54 Error: TVMError: std: :bad_alloc
16 Here