Transformer error prompting THUDM_codegeex4-all-9b
saveriodesign opened this issue · comments
Describe the bug
The model successfully loads in 8- and 4-bit quant, confirmed by VRAM usage observed in nvtop
. Upon first prompting of the model, the attached logs are seen in journalctl
. nvtop
shows zero GPU usage throughout, while maintaining the high VRAM usage in both quant states.
This issue recommends setting the version of the transformers
package to exactly 4.44.2
. I am having trouble manually penetrating the venv established by this repo to manually up- or downgrade transformers
as is required, and would like some guidance.
Is there an existing issue for this?
- I have searched the existing issues
Reproduction
- Download THUDM/codegeex4-all-9b using the web UI
- Load the model in 8- or 4-bit quant, trusting remote code
- Prompt the model
Screenshot
No response
Logs
Oct 30 11:02:12 llama start_linux.sh[405]: 11:02:12-571562 INFO Loaded "THUDM_codegeex4-all-9b" in 50.95 seconds.
Oct 30 11:02:12 llama start_linux.sh[405]: 11:02:12-573108 INFO LOADER: "Transformers"
Oct 30 11:02:12 llama start_linux.sh[405]: 11:02:12-573962 INFO TRUNCATION LENGTH: 2048
Oct 30 11:02:12 llama start_linux.sh[405]: 11:02:12-574881 INFO INSTRUCTION TEMPLATE: "Alpaca"
Oct 30 11:02:34 llama start_linux.sh[405]: Traceback (most recent call last):
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/queueing.py", line 566, in process_events
Oct 30 11:02:34 llama start_linux.sh[405]: response = await route_utils.call_process_api(
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/route_utils.py", line 261, in call_process_api
Oct 30 11:02:34 llama start_linux.sh[405]: output = await app.get_blocks().process_api(
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/blocks.py", line 1786, in process_api
Oct 30 11:02:34 llama start_linux.sh[405]: result = await self.call_function(
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/blocks.py", line 1350, in call_function
Oct 30 11:02:34 llama start_linux.sh[405]: prediction = await utils.async_iteration(iterator)
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 583, in async_iteration
Oct 30 11:02:34 llama start_linux.sh[405]: return await iterator.__anext__()
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 576, in __anext__
Oct 30 11:02:34 llama start_linux.sh[405]: return await anyio.to_thread.run_sync(
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
Oct 30 11:02:34 llama start_linux.sh[405]: return await get_async_backend().run_sync_in_worker_thread(
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
Oct 30 11:02:34 llama start_linux.sh[405]: return await future
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 943, in run
Oct 30 11:02:34 llama start_linux.sh[405]: result = context.run(func, *args)
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 559, in run_sync_iterator_async
Oct 30 11:02:34 llama start_linux.sh[405]: return next(iterator)
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/gradio/utils.py", line 742, in gen_wrapper
Oct 30 11:02:34 llama start_linux.sh[405]: response = next(iterator)
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/modules/chat.py", line 436, in generate_chat_reply_wrapper
Oct 30 11:02:34 llama start_linux.sh[405]: for i, history in enumerate(generate_chat_reply(text, state, regenerate, _continue, loading_message=True, for_ui=True)):
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/modules/chat.py", line 403, in generate_chat_reply
Oct 30 11:02:34 llama start_linux.sh[405]: for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message, for_ui=for_ui):
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/modules/chat.py", line 348, in chatbot_wrapper
Oct 30 11:02:34 llama start_linux.sh[405]: prompt = generate_chat_prompt(text, state, **kwargs)
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/modules/chat.py", line 200, in generate_chat_prompt
Oct 30 11:02:34 llama start_linux.sh[405]: encoded_length = get_encoded_length(prompt)
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/modules/text_generation.py", line 189, in get_encoded_length
Oct 30 11:02:34 llama start_linux.sh[405]: return len(encode(prompt)[0])
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/modules/text_generation.py", line 140, in encode
Oct 30 11:02:34 llama start_linux.sh[405]: input_ids = shared.tokenizer.encode(str(prompt), return_tensors='pt', add_special_tokens=add_special_tokens)
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2783, in encode
Oct 30 11:02:34 llama start_linux.sh[405]: encoded_inputs = self.encode_plus(
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3202, in encode_plus
Oct 30 11:02:34 llama start_linux.sh[405]: return self._encode_plus(
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/tokenization_utils.py", line 801, in _encode_plus
Oct 30 11:02:34 llama start_linux.sh[405]: return self.prepare_for_model(
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^^^^^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3698, in prepare_for_model
Oct 30 11:02:34 llama start_linux.sh[405]: encoded_inputs = self.pad(
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: File "/home/llama/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3500, in pad
Oct 30 11:02:34 llama start_linux.sh[405]: encoded_inputs = self._pad(
Oct 30 11:02:34 llama start_linux.sh[405]: ^^^^^^^^^^
Oct 30 11:02:34 llama start_linux.sh[405]: TypeError: ChatGLM4Tokenizer._pad() got an unexpected keyword argument 'padding_side'
System Info
- Proxmox VE 8.2.7
- Debian 12 LXC
- Python 3.11
- transformers 4.45.2
- PNY RTX A2000 12GB
- Driver 535.183.01
- CUDA 12.2
I'm not going to close the issue since the model I attempted to use still throws, but will provide a solution to anyone else having issues with Codegeex4-all-9B.
Downgrading transformers
to v4.44.2 can be done by editing its entry in requirements.txt
in the repository root. The venv can then be spun up by running the appropriate cmd_<os>
in the same directory. Then you can downgrade via pip install --upgrade -r requirements.txt
. This however won't work for the model; the padding_side
error goes away and is replaced by a new and exciting error I did not care to troubleshoot or even record.
Instead, I updated this repository to latest and downloaded one of the GGUF files from bartowski which fired up flawlessly.