modal-labs / quillman

A chat app that transcribes audio in real-time, streams back a response from a language model, and synthesizes this response as natural-sounding speech.

Home Page:https://modal.com/docs/guide/llm-voice-chat

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Could not find the quantized model in .pt or .safetensors format

bitnom opened this issue · comments

commented

modal serve src.app

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
0it [00:00, ?it/s]0it [00:00, ?it/s]
Downloading (…)okenizer_config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]Downloading (…)okenizer_config.json: 100%|██████████| 727/727 [00:00<00:00, 196kB/s]
Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]Downloading tokenizer.model: 100%|██████████| 500k/500k [00:00<00:00, 94.7MB/s]
Downloading (…)cial_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]Downloading (…)cial_tokens_map.json: 100%|██████████| 411/411 [00:00<00:00, 511kB/s]
Loading GPTQ quantized model...
Could not find the quantized model in .pt or .safetensors format, exiting...
Traceback (most recent call last):
  File "/pkg/modal/_container_entrypoint.py", line 301, in handle_user_exception
    yield
  File "/pkg/modal/_container_entrypoint.py", line 443, in call_function_async
    obj.__enter__()
  File "/root/src/llm_vicuna.py", line 79, in __enter__
    model = load_quantized(MODEL_NAME)
  File "/FastChat/fastchat/serve/load_gptq_model.py", line 60, in load_quantized
    exit()
  File "/usr/lib/python3.8/_sitebuiltins.py", line 26, in __call__
    raise SystemExit(code)
SystemExit: None
Runner failed with exception: SystemExit(None)
2023-05-30T15:45:37+0000 User exception caught, exiting

Hi @bitnom can you provide more details about how your Modal app is structured? The error suggests that the function load_quantized is looking for either a quantized version of weights in the .pt (PyTorch) or the .safetensor (Safe Tensors) format. Maybe updating your dependencies (fastchat?) can fix the issue?

commented

I'm also running into this same exact error when deploying the project without any changes.

I see. Would you have more details about the app that you are running? Maybe a code snippet? It would make debugging on our end a lot easier.

I've experienced the same issue with the original code from this repo.

I found that download_model() in llm_vicuna.py#L18 isn't executed unless I added a print statement a la print("Downloading model...") to the method.

I see. We cache functions and it looks like the cached version is invalid. We may need to add a version ID to that function.

Thank you @alexthewilde!

can you all try again? we recently made some updates to the underlying model that should hopefully fix this