Could not find the quantized model in .pt or .safetensors format

Question

Could not find the quantized model in .pt or .safetensors format

bitnom opened this issue a year ago · comments

modal serve src.app

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
0it [00:00, ?it/s]0it [00:00, ?it/s]
Downloading (…)okenizer_config.json:   0%|          | 0.00/727 [00:00<?, ?B/s]Downloading (…)okenizer_config.json: 100%|██████████| 727/727 [00:00<00:00, 196kB/s]
Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]Downloading tokenizer.model: 100%|██████████| 500k/500k [00:00<00:00, 94.7MB/s]
Downloading (…)cial_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]Downloading (…)cial_tokens_map.json: 100%|██████████| 411/411 [00:00<00:00, 511kB/s]
Loading GPTQ quantized model...
Could not find the quantized model in .pt or .safetensors format, exiting...
Traceback (most recent call last):
  File "/pkg/modal/_container_entrypoint.py", line 301, in handle_user_exception
    yield
  File "/pkg/modal/_container_entrypoint.py", line 443, in call_function_async
    obj.__enter__()
  File "/root/src/llm_vicuna.py", line 79, in __enter__
    model = load_quantized(MODEL_NAME)
  File "/FastChat/fastchat/serve/load_gptq_model.py", line 60, in load_quantized
    exit()
  File "/usr/lib/python3.8/_sitebuiltins.py", line 26, in __call__
    raise SystemExit(code)
SystemExit: None
Runner failed with exception: SystemExit(None)
2023-05-30T15:45:37+0000 User exception caught, exiting

Luis Capelo · Answer 1 · Wed May 31 2023 02:09:51 GMT+0800 (China Standard Time)

Hi @bitnom can you provide more details about how your Modal app is structured? The error suggests that the function load_quantized is looking for either a quantized version of weights in the .pt (PyTorch) or the .safetensor (Safe Tensors) format. Maybe updating your dependencies (fastchat?) can fix the issue?

konkey · Answer 2 · Thu Jun 01 2023 00:58:25 GMT+0800 (China Standard Time)

I'm also running into this same exact error when deploying the project without any changes.

Luis Capelo · Answer 3 · Thu Jun 01 2023 01:38:55 GMT+0800 (China Standard Time)

I see. Would you have more details about the app that you are running? Maybe a code snippet? It would make debugging on our end a lot easier.

Alex Wilde · Answer 4 · Thu Jun 01 2023 14:51:14 GMT+0800 (China Standard Time)

I've experienced the same issue with the original code from this repo.

I found that download_model() in llm_vicuna.py#L18 isn't executed unless I added a print statement a la print("Downloading model...") to the method.

Luis Capelo · Answer 5 · Fri Jun 02 2023 00:03:03 GMT+0800 (China Standard Time)

I see. We cache functions and it looks like the cached version is invalid. We may need to add a version ID to that function.

Thank you @alexthewilde!

Kenny Ning · Answer 6 · Thu Feb 15 2024 05:33:31 GMT+0800 (China Standard Time)

can you all try again? we recently made some updates to the underlying model that should hopefully fix this