IPEX v2.1.20+xpu regression with SDXL CLIP Text Encoder

Question

IPEX v2.1.20+xpu regression with SDXL CLIP Text Encoder

simonlui opened this issue 2 months ago · comments

Describe the bug

Something broke between when I compiled the fixes for #483 with d9455e8 and the release of IPEX v2.1.20+xpu. I am now getting the following error when I run the SDXL CLIP Text encoder on GPU with --gpu-only with ComfyUI as described previously.

got prompt
model_type EPS
Using pytorch attention in VAE
Using pytorch attention in VAE
Requested to load SDXLRefinerClipModel
Loading 1 new model
/deps/venv/lib/python3.10/site-packages/intel_extension_for_pytorch/frontend.py:465: UserWarning: Conv BatchNorm folding failed during the optimize process.
  warnings.warn(
/deps/venv/lib/python3.10/site-packages/intel_extension_for_pytorch/frontend.py:472: UserWarning: Linear BatchNorm folding failed during the optimize process.
  warnings.warn(
/deps/venv/lib/python3.10/site-packages/intel_extension_for_pytorch/nn/utils/_parameter_wrapper.py:329: UserWarning: WARNING: Can't convert model's parameters dtype from torch.float8_e4m3fn to torch.bfloat16
  warnings.warn(
!!! Exception during processing !!!
Traceback (most recent call last):
  File "/ComfyUI/execution.py", line 151, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
  File "/ComfyUI/execution.py", line 81, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
  File "/ComfyUI/execution.py", line 74, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
  File "/ComfyUI/nodes.py", line 57, in encode
    cond, pooled = clip.encode_from_tokens(tokens, return_pooled=True)
  File "/ComfyUI/comfy/sd.py", line 136, in encode_from_tokens
    cond, pooled = self.cond_stage_model.encode_token_weights(tokens)
  File "/ComfyUI/comfy/sd1_clip.py", line 517, in encode_token_weights
    out, pooled = getattr(self, self.clip).encode_token_weights(token_weight_pairs)
  File "/ComfyUI/comfy/sd1_clip.py", line 40, in encode_token_weights
    out, pooled = self.encode(to_encode)
  File "/ComfyUI/comfy/sd1_clip.py", line 196, in encode
    return self(tokens)
  File "/deps/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/deps/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/ComfyUI/comfy/sd1_clip.py", line 178, in forward
    outputs = self.transformer(tokens, attention_mask, intermediate_output=self.layer_idx, final_layer_norm_intermediate=self.layer_norm_hidden_state)
  File "/deps/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/deps/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/ComfyUI/comfy/clip_model.py", line 134, in forward
    x = self.text_model(*args, **kwargs)
  File "/deps/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/deps/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/ComfyUI/comfy/clip_model.py", line 109, in forward
    x, i = self.encoder(x, mask=mask, intermediate_output=intermediate_output)
  File "/deps/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/deps/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/ComfyUI/comfy/clip_model.py", line 68, in forward
    x = l(x, mask, optimized_attention)
  File "/deps/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/deps/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/ComfyUI/comfy/clip_model.py", line 49, in forward
    x += self.self_attn(self.layer_norm1(x), mask, optimized_attention)
  File "/deps/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/deps/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/ComfyUI/comfy/clip_model.py", line 20, in forward
    out = optimized_attention(q, k, v, self.heads, mask)
  File "/ComfyUI/comfy/ldm/modules/attention.py", line 345, in attention_pytorch
    out = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=mask, dropout_p=0.0, is_causal=False)
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)

Not sure if it is a complete regression of that issue or something else entirely but it feels like the latter.

Versions

PyTorch version: 2.1.0.post0+cxx11.abi
PyTorch CXX11 ABI: Yes
IPEX version: 2.1.20+xpu
IPEX commit: b78b4d97e
Build type: Release

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: N/A
Clang version: N/A
IGC version: N/A
CMake version: version 3.28.4
Libc version: glibc-2.35

Python version: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.7.10-200.fc39.x86_64-x86_64-with-glibc2.35
Is XPU available: True
DPCPP runtime version: N/A
MKL version: N/A
GPU models and configuration: 
[0] _DeviceProperties(name='Intel(R) Arc(TM) A770 Graphics', platform_name='Intel(R) Level-Zero', dev_type='gpu, support_fp64=1, total_memory=16288MB, max_compute_units=512, gpu_eu_count=512)
Intel OpenCL ICD version: 23.17.26241.33-647~22.04
Level Zero version: 1.3.26241.33-647~22.04

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        48 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               32
On-line CPU(s) list:                  0-31
Vendor ID:                            AuthenticAMD
Model name:                           AMD Ryzen 9 5950X 16-Core Processor
CPU family:                           25
Model:                                33
Thread(s) per core:                   2
Core(s) per socket:                   16
Socket(s):                            1
Stepping:                             0
Frequency boost:                      enabled
CPU max MHz:                          5084.0000
CPU min MHz:                          550.0000
BogoMIPS:                             6800.37
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm debug_swap
Virtualization:                       AMD-V
L1d cache:                            512 KiB (16 instances)
L1i cache:                            512 KiB (16 instances)
L2 cache:                             8 MiB (16 instances)
L3 cache:                             64 MiB (2 instances)
NUMA node(s):                         1
NUMA node0 CPU(s):                    0-31
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Vulnerable: Safe RET, no microcode
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] intel-extension-for-pytorch==2.1.20+xpu
[pip3] numpy==1.24.4
[pip3] open-clip-torch==2.24.0
[pip3] torch==2.1.0.post0+cxx11.abi
[pip3] torchaudio==2.1.0.post0+cxx11.abi
[pip3] torchsde==0.2.6
[pip3] torchvision==0.16.0.post0+cxx11.abi
[conda] N/A

Vasudha · Answer 1 · Tue Apr 02 2024 13:19:29 GMT+0800 (China Standard Time)

@simonlui, Thanks for reporting, we will investigate and get back to you.

Simon Lui · Answer 2 · Tue Apr 16 2024 14:46:10 GMT+0800 (China Standard Time)

The error no longer appears with xpu-main using 78fe3c2 but I am either getting a system freeze or PI Error -999 instead. Will investigate further on my end.

Vasudha · Answer 3 · Wed Apr 17 2024 19:11:12 GMT+0800 (China Standard Time)

Hi @simonlui, thanks for sharing the updates, let us know how it goes or if you face any further issues.

Simon Lui · Answer 4 · Wed Apr 24 2024 12:02:50 GMT+0800 (China Standard Time)

It was just something with my system at the time, I think, that caused the error. Retesting this, and doing verification, I can confirm it was solved by some commit in between v2.1.20+xpu and 78fe3c2, and 0e5ab2a also works which is where xpu-main is at the time of writing this. Closing.