oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Transformer error prompting THUDM_codegeex4-all-9b

saveriodesign opened this issue · comments

Describe the bug

The model successfully loads in 8- and 4-bit quant, confirmed by VRAM usage observed in nvtop. Upon first prompting of the model, the attached logs are seen in journalctl. nvtop shows zero GPU usage throughout, while maintaining the high VRAM usage in both quant states.

This issue recommends setting the version of the transformers package to exactly 4.44.2. I am having trouble manually penetrating the venv established by this repo to manually up- or downgrade transformers as is required, and would like some guidance.

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

  • Download THUDM/codegeex4-all-9b using the web UI
  • Load the model in 8- or 4-bit quant, trusting remote code
  • Prompt the model

Screenshot

No response

Logs

Oct 30 11:02:12 llama start_linux.sh[405]: 11:02:12-571562 INFO Loaded "THUDM_codegeex4-all-9b" in 50.95 seconds.
Oct 30 11:02:12 llama start_linux.sh[405]: 11:02:12-573108 INFO LOADER: "Transformers"
Oct 30 11:02:12 llama start_linux.sh[405]: 11:02:12-573962 INFO TRUNCATION LENGTH: 2048
Oct 30 11:02:12 llama start_linux.sh[405]: 11:02:12-574881 INFO INSTRUCTION TEMPLATE: "Alpaca"
Oct 30 11:02:34 llama start_linux.sh[405]: Traceback (most recent call last): 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/queueing.py", line 566, in process_events 
Oct 30 11:02:34 llama start_linux.sh[405]: response = await route_utils.call_process_api( 
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/route_utils.py", line 261, in call_process_api 
Oct 30 11:02:34 llama start_linux.sh[405]: output = await app.get_blocks().process_api( 
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/blocks.py", line 1786, in process_api 
Oct 30 11:02:34 llama start_linux.sh[405]: result = await self.call_function( 
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^ 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/blocks.py", line 1350, in call_function 
Oct 30 11:02:34 llama start_linux.sh[405]: prediction = await utils.async_iteration(iterator) 
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 583, in async_iteration 
Oct 30 11:02:34 llama start_linux.sh[405]: return await iterator.__anext__() 
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 576, in __anext__ 
Oct 30 11:02:34 llama start_linux.sh[405]: return await anyio.to_thread.run_sync( 
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync 
Oct 30 11:02:34 llama start_linux.sh[405]: return await get_async_backend().run_sync_in_worker_thread( 
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread 
Oct 30 11:02:34 llama start_linux.sh[405]: return await future 
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^ 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 943, in run 
Oct 30 11:02:34 llama start_linux.sh[405]: result = context.run(func, *args) 
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^ 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 559, in run_sync_iterator_async 
Oct 30 11:02:34 llama start_linux.sh[405]: return next(iterator) 
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^ 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 742, in gen_wrapper 
Oct 30 11:02:34 llama start_linux.sh[405]: response = next(iterator) 
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^ 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/modules/chat.py", line 436, in generate_chat_reply_wrapper 
Oct 30 11:02:34 llama start_linux.sh[405]: for i, history in enumerate(generate_chat_reply(text, state, regenerate, _continue, loading_message=True, for_ui=True)): 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/modules/chat.py", line 403, in generate_chat_reply 
Oct 30 11:02:34 llama start_linux.sh[405]: for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message, for_ui=for_ui): 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/modules/chat.py", line 348, in chatbot_wrapper 
Oct 30 11:02:34 llama start_linux.sh[405]: prompt = generate_chat_prompt(text, state, **kwargs) 
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/modules/chat.py", line 200, in generate_chat_prompt 
Oct 30 11:02:34 llama start_linux.sh[405]: encoded_length = get_encoded_length(prompt) 
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/modules/text_generation.py", line 189, in get_encoded_length 
Oct 30 11:02:34 llama start_linux.sh[405]: return len(encode(prompt)[0]) 
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^ 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/modules/text_generation.py", line 140, in encode 
Oct 30 11:02:34 llama start_linux.sh[405]: input_ids = shared.tokenizer.encode(str(prompt), return_tensors='pt', add_special_tokens=add_special_tokens) 
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2783, in encode 
Oct 30 11:02:34 llama start_linux.sh[405]: encoded_inputs = self.encode_plus( 
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^ 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3202, in encode_plus 
Oct 30 11:02:34 llama start_linux.sh[405]: return self._encode_plus( 
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^ 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 801, in _encode_plus 
Oct 30 11:02:34 llama start_linux.sh[405]: return self.prepare_for_model( 
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^ 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3698, in prepare_for_model 
Oct 30 11:02:34 llama start_linux.sh[405]: encoded_inputs = self.pad( 
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^ 
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3500, in pad 
Oct 30 11:02:34 llama start_linux.sh[405]: encoded_inputs = self._pad( 
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^ 
Oct 30 11:02:34 llama start_linux.sh[405]: TypeError: ChatGLM4Tokenizer._pad() got an unexpected keyword argument 'padding_side'

System Info

- Proxmox VE 8.2.7
  - Debian 12 LXC
    - Python 3.11
      - transformers 4.45.2
- PNY RTX A2000 12GB
  - Driver 535.183.01
  - CUDA 12.2

I'm not going to close the issue since the model I attempted to use still throws, but will provide a solution to anyone else having issues with Codegeex4-all-9B.

Downgrading transformers to v4.44.2 can be done by editing its entry in requirements.txt in the repository root. The venv can then be spun up by running the appropriate cmd_<os> in the same directory. Then you can downgrade via pip install --upgrade -r requirements.txt. This however won't work for the model; the padding_side error goes away and is replaced by a new and exciting error I did not care to troubleshoot or even record.

Instead, I updated this repository to latest and downloaded one of the GGUF files from bartowski which fired up flawlessly.