mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation

Home Page:https://llm.mlc.ai/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug] Android app crashing/error with all models

richcb opened this issue Β· comments

πŸ› Bug

Every model downloaded either complains download not complete, try to redownload, or when it is downloaded again or doesn't make that complaint, throws this error:

MLCChat failed

Stack trace:
org.apache.tvm.Base$TVMError: TVMError: Function vm.builtin.paged_attention_kv_cache_create_reduced(0: runtime.ShapeTuple, 1: int64_t, 2: int64_t, 3: int64_t, 4: int64_t, 5: int, 6: double, 7: double, 8: runtime.NDArray, 9: runtime.PackedFunc, 10: runtime.PackedFunc, 11: runtime.PackedFunc, 12: runtime.PackedFunc, 13: runtime.PackedFunc, 14: runtime.PackedFunc, 15: runtime.PackedFunc, 16: runtime.PackedFunc, 17: runtime.PackedFunc, 18: runtime.PackedFunc) -> relax.vm.AttentionKVCache expects 19 arguments, but 18 were provided.
Stack trace:
File "/Users/kartik/mlc/tvm/include/tvm/runtime/packed_func.h", line 1908

at org.apache.tvm.Base.checkCall(Base.java:173)
at org.apache.tvm.Function.invoke(Function.java:130)
at ai.mlc.mlcllm.ChatModule.reload(ChatModule.java:46)
at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:648)
at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:646)
at ai.mlc.mlcchat.AppViewModel$ChatState.callBackend(AppViewModel.kt:548)
at ai.mlc.mlcchat.AppViewModel$ChatState.mainReloadChat$lambda$3(AppViewModel.kt:646)
at ai.mlc.mlcchat.AppViewModel$ChatState.$r8$lambda$CXL6v4mjTu_Sr5Pk2zFDcus0R-8(Unknown Source:0)
at ai.mlc.mlcchat.AppViewModel$ChatState$$ExternalSyntheticLambda2.run(Unknown Source:8)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:487)
at java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
at java.lang.Thread.run(Thread.java:1012)

To Reproduce

Steps to reproduce the behavior:

  1. install app
  2. download any model (error log provided from attempt to run gemma-2b-q416_1
  3. tap the chat icon to the right of model name

Expected behavior

Text chat with a language model

Environment

Additional context

App attempts to initialize chat but throws the error either immediately after initialize or when interacting with the input box.

I have the exact same issue on a Galaxy A54 running Android 14.

Hi @DesolateIntention, I can chat with Gemma-2-2b-it-q4f16_1-MLC on the Samsung S23. Could you please try uninstall first and then reinstalling the app? If the issue persists, could you provide a log for debugging? Thanks.

I have the same problem on my vivox100

@DesolateIntention @seabodylibra
First of all try clear Mlc Chat app data and cache, uninstall app and reboot phone.

@richcb @DesolateIntention @seabodylibra
First of all try clear Mlc Chat app data and cache, uninstall app and reboot phone.

@richcb @DesolateIntention @seabodylibra First of all try clear Mlc Chat app data and cache, uninstall app and reboot phone.

I have done this. App now crashes at Initialize toast. Does not go far enough to get an error log
Screenshot_20240826_145104_MLCChat
Screenshot_20240826_145126_Nova Launcher

I have additionally cleared ram/closed all background apps, restarted phone again.

I will try again with another model and/or newest apk release if there is one newer

UPDATE

Llama 2 and Llama 3 work, but all other models give these errors:

Gemma
MLCChat failed

Stack trace:
org.apache.tvm.Base$TVMError: ValueError: Error when loading parameters from params_shard_8.bin: [14:57:01] /Users/kartik/mlc/tvm/src/runtime/relax_vm/ndarray_cache_support.cc:193: Check failed: this->nbytes == raw_data_buffer->length() (29542400 vs. 5234233) : ValueError: Encountered an corrupted parameter shard. It means it is not downloaded completely or downloading is interrupted. Please try to download again.
Stack trace:
File "/Users/kartik/mlc/tvm/src/runtime/relax_vm/ndarray_cache_support.cc", line 255

at org.apache.tvm.Base.checkCall(Base.java:173)
at org.apache.tvm.Function.invoke(Function.java:130)
at ai.mlc.mlcllm.ChatModule.reload(ChatModule.java:46)
at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:648)
at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:646)
at ai.mlc.mlcchat.AppViewModel$ChatState.callBackend(AppViewModel.kt:548)
at ai.mlc.mlcchat.AppViewModel$ChatState.mainReloadChat$lambda$3(AppViewModel.kt:646)
at ai.mlc.mlcchat.AppViewModel$ChatState.$r8$lambda$CXL6v4mjTu_Sr5Pk2zFDcus0R-8(Unknown Source:0)
at ai.mlc.mlcchat.AppViewModel$ChatState$$ExternalSyntheticLambda2.run(Unknown Source:8)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:487)
at java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
at java.lang.Thread.run(Thread.java:1012)

Error message:
ValueError: Error when loading parameters from params_shard_8.bin: [14:57:01] /Users/kartik/mlc/tvm/src/runtime/relax_vm/ndarray_cache_support.cc:193: Check failed: this->nbytes == raw_data_buffer->length() (29542400 vs. 5234233) : ValueError: Encountered an corrupted parameter shard. It means it is not downloaded completely or downloading is interrupted. Please try to download again.
Stack trace:
File "/Users/kartik/mlc/tvm/src/runtime/relax_vm/ndarray_cache_support.cc", line 255

Phi

MLCChat failed

Stack trace:
org.apache.tvm.Base$TVMError: TVMError: Function vm.builtin.paged_attention_kv_cache_create_reduced(0: runtime.ShapeTuple, 1: int64_t, 2: int64_t, 3: int64_t, 4: int64_t, 5: int, 6: double, 7: double, 8: runtime.NDArray, 9: runtime.PackedFunc, 10: runtime.PackedFunc, 11: runtime.PackedFunc, 12: runtime.PackedFunc, 13: runtime.PackedFunc, 14: runtime.PackedFunc, 15: runtime.PackedFunc, 16: runtime.PackedFunc, 17: runtime.PackedFunc, 18: runtime.PackedFunc) -> relax.vm.AttentionKVCache expects 19 arguments, but 18 were provided.
Stack trace:
File "/Users/kartik/mlc/tvm/include/tvm/runtime/packed_func.h", line 1908

at org.apache.tvm.Base.checkCall(Base.java:173)
at org.apache.tvm.Function.invoke(Function.java:130)
at ai.mlc.mlcllm.ChatModule.reload(ChatModule.java:46)
at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:648)
at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:646)
at ai.mlc.mlcchat.AppViewModel$ChatState.callBackend(AppViewModel.kt:548)
at ai.mlc.mlcchat.AppViewModel$ChatState.mainReloadChat$lambda$3(AppViewModel.kt:646)
at ai.mlc.mlcchat.AppViewModel$ChatState.$r8$lambda$CXL6v4mjTu_Sr5Pk2zFDcus0R-8(Unknown Source:0)
at ai.mlc.mlcchat.AppViewModel$ChatState$$ExternalSyntheticLambda2.run(Unknown Source:8)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:487)
at java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
at java.lang.Thread.run(Thread.java:1012)

Error message:
TVMError: Function vm.builtin.paged_attention_kv_cache_create_reduced(0: runtime.ShapeTuple, 1: int64_t, 2: int64_t, 3: int64_t, 4: int64_t, 5: int, 6: double, 7: double, 8: runtime.NDArray, 9: runtime.PackedFunc, 10: runtime.PackedFunc, 11: runtime.PackedFunc, 12: runtime.PackedFunc, 13: runtime.PackedFunc, 14: runtime.PackedFunc, 15: runtime.PackedFunc, 16: runtime.PackedFunc, 17: runtime.PackedFunc, 18: runtime.PackedFunc) -> relax.vm.AttentionKVCache expects 19 arguments, but 18 were provided.
Stack trace:
File "/Users/kartik/mlc/tvm/include/tvm/runtime/packed_func.h", line 1908

Red Pajama

MLCChat failed

Stack trace:
org.apache.tvm.Base$TVMError: TVMError: Function vm.builtin.paged_attention_kv_cache_create_reduced(0: runtime.ShapeTuple, 1: int64_t, 2: int64_t, 3: int64_t, 4: int64_t, 5: int, 6: double, 7: double, 8: runtime.NDArray, 9: runtime.PackedFunc, 10: runtime.PackedFunc, 11: runtime.PackedFunc, 12: runtime.PackedFunc, 13: runtime.PackedFunc, 14: runtime.PackedFunc, 15: runtime.PackedFunc, 16: runtime.PackedFunc, 17: runtime.PackedFunc, 18: runtime.PackedFunc) -> relax.vm.AttentionKVCache expects 19 arguments, but 18 were provided.
Stack trace:
File "/Users/kartik/mlc/tvm/include/tvm/runtime/packed_func.h", line 1908

at org.apache.tvm.Base.checkCall(Base.java:173)
at org.apache.tvm.Function.invoke(Function.java:130)
at ai.mlc.mlcllm.ChatModule.reload(ChatModule.java:46)
at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:648)
at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:646)
at ai.mlc.mlcchat.AppViewModel$ChatState.callBackend(AppViewModel.kt:548)
at ai.mlc.mlcchat.AppViewModel$ChatState.mainReloadChat$lambda$3(AppViewModel.kt:646)
at ai.mlc.mlcchat.AppViewModel$ChatState.$r8$lambda$CXL6v4mjTu_Sr5Pk2zFDcus0R-8(Unknown Source:0)
at ai.mlc.mlcchat.AppViewModel$ChatState$$ExternalSyntheticLambda2.run(Unknown Source:8)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:487)
at java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
at java.lang.Thread.run(Thread.java:1012)

Error message:
TVMError: Function vm.builtin.paged_attention_kv_cache_create_reduced(0: runtime.ShapeTuple, 1: int64_t, 2: int64_t, 3: int64_t, 4: int64_t, 5: int, 6: double, 7: double, 8: runtime.NDArray, 9: runtime.PackedFunc, 10: runtime.PackedFunc, 11: runtime.PackedFunc, 12: runtime.PackedFunc, 13: runtime.PackedFunc, 14: runtime.PackedFunc, 15: runtime.PackedFunc, 16: runtime.PackedFunc, 17: runtime.PackedFunc, 18: runtime.PackedFunc) -> relax.vm.AttentionKVCache expects 19 arguments, but 18 were provided.
Stack trace:
File "/Users/kartik/mlc/tvm/include/tvm/runtime/packed_func.h", line 1908

App quits responding or crashes very, very often, upwards of 6-10 times per minute. This is before initializing a model. After, Llama 2 or 3 work slowly but stably. Other models still do not work.

@DesolateIntention @seabodylibra
First of all try clear Mlc Chat app data and cache, uninstall app and reboot phone.

Thank you for your explanation.I am encountering the exact same issues which the author posted earlier. Only Llama 2 and Llama 3 work, while all other models fail to initialize, leading to frequent crashes. I've also tried clearing the app data, cache,
uninstalling, rebooting, and closing background apps, but the problem persists. Are there any updates or fixes available for this issue?