webui crashes after sending prompt

Question

webui crashes after sending prompt

ndrew222 opened this issue 2 months ago · comments

ndrew222 commented 2 months ago

Describe the bug

webui crashes after sending prompt

Is there an existing issue for this?

I have searched the existing issues

Reproduction

./start_linux.sh
load a model
send any prompt

Screenshot

No response

Logs

❯ ./start_linux.sh
14:01:15-573868 INFO     Starting Text generation web UI                                             

Running on local URL:  http://127.0.0.1:7860

14:01:45-565977 INFO     Loading "Phi-3-mini-4k-instruct-fp16.gguf"                                  
14:01:45-598096 INFO     llama.cpp weights detected: "models/Phi-3-mini-4k-instruct-fp16.gguf"       
llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from models/Phi-3-mini-4k-instruct-fp16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                           llama.vocab_size u32              = 32064
llama_model_loader: - kv   3:                       llama.context_length u32              = 4096
llama_model_loader: - kv   4:                     llama.embedding_length u32              = 3072
llama_model_loader: - kv   5:                          llama.block_count u32              = 32
llama_model_loader: - kv   6:                  llama.feed_forward_length u32              = 8192
llama_model_loader: - kv   7:                 llama.rope.dimension_count u32              = 96
llama_model_loader: - kv   8:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   9:              llama.attention.head_count_kv u32              = 32
llama_model_loader: - kv  10:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  11:                       llama.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  12:                          general.file_type u32              = 1
llama_model_loader: - kv  13:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  14:                      tokenizer.ggml.tokens arr[str,32064]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  15:                      tokenizer.ggml.scores arr[f32,32064]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  16:                  tokenizer.ggml.token_type arr[i32,32064]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  17:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  18:                tokenizer.ggml.eos_token_id u32              = 32000
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 32000
llama_model_loader: - kv  21:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  22:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  23:                    tokenizer.chat_template str              = {{ bos_token }}{% for message in mess...
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type  f16:  226 tensors
llm_load_vocab: control-looking token: '<|end|>' was not control-type; this is probably a bug in the model. its type will be overridden
llm_load_vocab: control-looking token: '<|endoftext|>' was not control-type; this is probably a bug in the model. its type will be overridden
llm_load_vocab: special tokens cache size = 67
llm_load_vocab: token to piece cache size = 0.1691 MB
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32064
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: vocab_only       = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 3072
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_rot            = 96
llm_load_print_meta: n_swa            = 0
llm_load_print_meta: n_embd_head_k    = 96
llm_load_print_meta: n_embd_head_v    = 96
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 3072
llm_load_print_meta: n_embd_v_gqa     = 3072
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale    = 0.0e+00
llm_load_print_meta: n_ff             = 8192
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: causal attn      = 1
llm_load_print_meta: pooling type     = 0
llm_load_print_meta: rope type        = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: ssm_d_conv       = 0
llm_load_print_meta: ssm_d_inner      = 0
llm_load_print_meta: ssm_d_state      = 0
llm_load_print_meta: ssm_dt_rank      = 0
llm_load_print_meta: ssm_dt_b_c_rms   = 0
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = F16
llm_load_print_meta: model params     = 3.82 B
llm_load_print_meta: model size       = 7.12 GiB (16.00 BPW) 
llm_load_print_meta: general.name     = LLaMA v2
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 32000 '<|endoftext|>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 32000 '<|endoftext|>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_print_meta: EOT token        = 32007 '<|end|>'
llm_load_print_meta: EOG token        = 32000 '<|endoftext|>'
llm_load_print_meta: EOG token        = 32007 '<|end|>'
llm_load_print_meta: max token length = 48
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 6900 XT, compute capability 10.3, VMM: no
llm_load_tensors: ggml ctx size =    0.27 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:      ROCm0 buffer size =  7100.64 MiB
llm_load_tensors:        CPU buffer size =   187.88 MiB
.................................................................................................
llama_new_context_with_model: n_ctx      = 4096
llama_new_context_with_model: n_batch    = 512
llama_new_context_with_model: n_ubatch   = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      ROCm0 KV buffer size =  1536.00 MiB
llama_new_context_with_model: KV self size  = 1536.00 MiB, K (f16):  768.00 MiB, V (f16):  768.00 MiB
llama_new_context_with_model:  ROCm_Host  output buffer size =     0.12 MiB
llama_new_context_with_model:      ROCm0 compute buffer size =   288.00 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =    14.01 MiB
llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 2
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | 
Model metadata: {'tokenizer.chat_template': "{{ bos_token }}{% for message in messages %}{{'<|' + message['role'] + '|>' + '\n' + message['content'] + '<|end|>\n' }}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>\n' }}{% else %}{{ eos_token }}{% endif %}", 'tokenizer.ggml.add_eos_token': 'false', 'tokenizer.ggml.padding_token_id': '32000', 'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.eos_token_id': '32000', 'tokenizer.ggml.model': 'llama', 'general.architecture': 'llama', 'llama.rope.freq_base': '10000.000000', 'llama.context_length': '4096', 'general.name': 'LLaMA v2', 'llama.vocab_size': '32064', 'general.file_type': '1', 'tokenizer.ggml.add_bos_token': 'true', 'llama.embedding_length': '3072', 'llama.feed_forward_length': '8192', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.dimension_count': '96', 'tokenizer.ggml.bos_token_id': '1', 'llama.attention.head_count': '32', 'llama.block_count': '32', 'llama.attention.head_count_kv': '32'}
Available chat formats from metadata: chat_template.default
Using gguf chat template: {{ bos_token }}{% for message in messages %}{{'<|' + message['role'] + '|>' + '
' + message['content'] + '<|end|>
' }}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>
' }}{% else %}{{ eos_token }}{% endif %}
Using chat eos_token: <|endoftext|>
Using chat bos_token: <s>
14:01:47-732817 INFO     Loaded "Phi-3-mini-4k-instruct-fp16.gguf" in 2.17 seconds.                  
14:01:47-733607 INFO     LOADER: "llama.cpp"                                                         
14:01:47-734197 INFO     TRUNCATION LENGTH: 4096                                                     
14:01:47-734685 INFO     INSTRUCTION TEMPLATE: "Custom (obtained from model metadata)"               
ggml_cuda_compute_forward: RMS_NORM failed
CUDA error: invalid device function
  current device: 0, in function ggml_cuda_compute_forward at /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml/src/ggml-cuda.cu:2368
  err
/home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml/src/ggml-cuda.cu:106: CUDA error
[New LWP 12809]
[New LWP 12789]
[New LWP 12788]
[New LWP 12751]
[New LWP 12716]
[New LWP 12715]
[New LWP 12714]
[New LWP 12713]
[New LWP 12712]
[New LWP 12711]
[New LWP 12710]
[New LWP 12709]
[New LWP 12708]
[New LWP 12707]
[New LWP 12706]
[New LWP 12705]
[New LWP 12704]
[New LWP 12703]
[New LWP 12702]
[New LWP 12701]
[New LWP 12700]
[New LWP 12699]

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.fedoraproject.org/>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007fe8071e5c13 in clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6
#0  0x00007fe8071e5c13 in clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6
#1  0x0000000000645275 in pysleep (timeout=<optimized out>) at /usr/local/src/conda/python-3.11.10/Modules/timemodule.c:2159
warning: 2159	/usr/local/src/conda/python-3.11.10/Modules/timemodule.c: No such file or directory
#2  time_sleep (self=<optimized out>, timeout_obj=<optimized out>) at /usr/local/src/conda/python-3.11.10/Modules/timemodule.c:383
383	in /usr/local/src/conda/python-3.11.10/Modules/timemodule.c
#3  0x0000000000511e46 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x8a7a38 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7fe8073fa020, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.10/Python/ceval.c:5020
warning: 5020	/usr/local/src/conda/python-3.11.10/Python/ceval.c: No such file or directory
#4  0x00000000005cc1ea in _PyEval_EvalFrame (throwflag=0, frame=0x7fe8073fa020, tstate=0x8a7a38 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.10/Include/internal/pycore_ceval.h:73
warning: 73	/usr/local/src/conda/python-3.11.10/Include/internal/pycore_ceval.h: No such file or directory
#5  _PyEval_Vector (tstate=tstate@entry=0x8a7a38 <_PyRuntime+166328>, func=func@entry=0x7fe8070987c0, locals=locals@entry=0x7fe8070f24c0, args=args@entry=0x0, argcount=argcount@entry=0, kwnames=kwnames@entry=0x0) at /usr/local/src/conda/python-3.11.10/Python/ceval.c:6434
warning: 6434	/usr/local/src/conda/python-3.11.10/Python/ceval.c: No such file or directory
#6  0x00000000005cb8bf in PyEval_EvalCode (co=co@entry=0xbac8130, globals=globals@entry=0x7fe8070f24c0, locals=locals@entry=0x7fe8070f24c0) at /usr/local/src/conda/python-3.11.10/Python/ceval.c:1148
1148	in /usr/local/src/conda/python-3.11.10/Python/ceval.c
#7  0x00000000005ec9e7 in run_eval_code_obj (tstate=tstate@entry=0x8a7a38 <_PyRuntime+166328>, co=co@entry=0xbac8130, globals=globals@entry=0x7fe8070f24c0, locals=locals@entry=0x7fe8070f24c0) at /usr/local/src/conda/python-3.11.10/Python/pythonrun.c:1741
warning: 1741	/usr/local/src/conda/python-3.11.10/Python/pythonrun.c: No such file or directory
#8  0x00000000005e8580 in run_mod (mod=mod@entry=0xbae9900, filename=filename@entry=0x7fe80702d300, globals=globals@entry=0x7fe8070f24c0, locals=locals@entry=0x7fe8070f24c0, flags=flags@entry=0x7fff954f7af8, arena=arena@entry=0x7fe80701b630) at /usr/local/src/conda/python-3.11.10/Python/pythonrun.c:1762
1762	in /usr/local/src/conda/python-3.11.10/Python/pythonrun.c
#9  0x00000000005fd4d2 in pyrun_file (fp=fp@entry=0xba23080, filename=filename@entry=0x7fe80702d300, start=start@entry=257, globals=globals@entry=0x7fe8070f24c0, locals=locals@entry=0x7fe8070f24c0, closeit=closeit@entry=1, flags=0x7fff954f7af8) at /usr/local/src/conda/python-3.11.10/Python/pythonrun.c:1657
1657	in /usr/local/src/conda/python-3.11.10/Python/pythonrun.c
#10 0x00000000005fc89f in _PyRun_SimpleFileObject (fp=0xba23080, filename=0x7fe80702d300, closeit=1, flags=0x7fff954f7af8) at /usr/local/src/conda/python-3.11.10/Python/pythonrun.c:440
440	in /usr/local/src/conda/python-3.11.10/Python/pythonrun.c
#11 0x00000000005fc5c3 in _PyRun_AnyFileObject (fp=0xba23080, filename=filename@entry=0x7fe80702d300, closeit=closeit@entry=1, flags=flags@entry=0x7fff954f7af8) at /usr/local/src/conda/python-3.11.10/Python/pythonrun.c:79
79	in /usr/local/src/conda/python-3.11.10/Python/pythonrun.c
#12 0x00000000005f723e in pymain_run_file_obj (skip_source_first_line=0, filename=0x7fe80702d300, program_name=0x7fe8070f26b0) at /usr/local/src/conda/python-3.11.10/Modules/main.c:360
warning: 360	/usr/local/src/conda/python-3.11.10/Modules/main.c: No such file or directory
#13 pymain_run_file (config=0x88da80 <_PyRuntime+59904>) at /usr/local/src/conda/python-3.11.10/Modules/main.c:379
379	in /usr/local/src/conda/python-3.11.10/Modules/main.c
#14 pymain_run_python (exitcode=0x7fff954f7af0) at /usr/local/src/conda/python-3.11.10/Modules/main.c:605
605	in /usr/local/src/conda/python-3.11.10/Modules/main.c
#15 Py_RunMain () at /usr/local/src/conda/python-3.11.10/Modules/main.c:684
684	in /usr/local/src/conda/python-3.11.10/Modules/main.c
#16 0x00000000005bbf89 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /usr/local/src/conda/python-3.11.10/Modules/main.c:738
738	in /usr/local/src/conda/python-3.11.10/Modules/main.c
#17 0x00007fe80712c088 in __libc_start_call_main () from /lib64/libc.so.6
#18 0x00007fe80712c14b in __libc_start_main_impl () from /lib64/libc.so.6
#19 0x00000000005bbdd3 in _start ()
[Inferior 1 (process 12681) detached]

System Info

fedora 6.11.5-200.fc40.x86_64
Fedora 40
CPU: AMD Ryzen 7 5800X (16) @ 4.85 GHz
GPU: AMD Radeon RX 6900 XT [Discrete]


rocminfo
========
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
Runtime Ext Version:     1.4
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 5800X 8-Core Processor 
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 5800X 8-Core Processor 
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   4851                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    40969404(0x27124bc) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    40969404(0x27124bc) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    40969404(0x27124bc) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1030                            
  Uuid:                    GPU-762c9ecf002e0002               
  Marketing Name:          AMD Radeon RX 6900 XT              
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      4096(0x1000) KB                    
    L3:                      131072(0x20000) KB                 
  Chip ID:                 29615(0x73af)                      
  ASIC Revision:           1(0x1)                             
  Cacheline Size:          128(0x80)                          
  Max Clock Freq. (MHz):   2720                               
  BDFID:                   2816                               
  Internal Node ID:        1                                  
  Compute Unit:            80                                 
  SIMDs per CU:            2                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 120                                
  SDMA engine uCode::      83                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16760832(0xffc000) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    16760832(0xffc000) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1030         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***    



rocm-clinfo
===========
Number of platforms:				 1
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 2.1 AMD-APP (3614.0)
  Platform Name:				 AMD Accelerated Parallel Processing
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:				 cl_khr_icd cl_amd_event_callback 


  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 1
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 AMD Radeon RX 6900 XT
  Device Topology:				 PCI[ B#11, D#0, F#0 ]
  Max compute units:				 40
  Max work items dimensions:			 3
    Max work items[0]:				 1024
    Max work items[1]:				 1024
    Max work items[2]:				 1024
  Max work group size:				 256
  Preferred vector width char:			 4
  Preferred vector width short:			 2
  Preferred vector width int:			 1
  Preferred vector width long:			 1
  Preferred vector width float:			 1
  Preferred vector width double:		 1
  Native vector width char:			 4
  Native vector width short:			 2
  Native vector width int:			 1
  Native vector width long:			 1
  Native vector width float:			 1
  Native vector width double:			 1
  Max clock frequency:				 2720Mhz
  Address bits:					 64
  Max memory allocation:			 14588628168
  Image support:				 Yes
  Max number of images read arguments:		 128
  Max number of images write arguments:		 8
  Max image 2D width:				 16384
  Max image 2D height:				 16384
  Max image 3D width:				 16384
  Max image 3D height:				 16384
  Max image 3D depth:				 8192
  Max samplers within kernel:			 16
  Max size of kernel argument:			 1024
  Alignment (bits) of base address:		 1024
  Minimum alignment (bytes) for any datatype:	 128
  Single precision floating point capability
    Denorms:					 Yes
    Quiet NaNs:					 Yes
    Round to nearest even:			 Yes
    Round to zero:				 Yes
    Round to +ve and infinity:			 Yes
    IEEE754-2008 fused multiply-add:		 Yes
  Cache type:					 Read/Write
  Cache line size:				 128
  Cache size:					 16384
  Global memory size:				 17163091968
  Constant buffer size:				 14588628168
  Max number of constant args:			 8
  Local memory type:				 Scratchpad
  Local memory size:				 65536
  Max pipe arguments:				 16
  Max pipe active reservations:			 16
  Max pipe packet size:				 1703726280
  Max global variable size:			 14588628168
  Max global variable preferred total size:	 17163091968
  Max read/write image args:			 64
  Max on device events:				 1024
  Queue on device max size:			 8388608
  Max on device queues:				 1
  Queue on device preferred size:		 262144
  SVM capabilities:				 
    Coarse grain buffer:			 Yes
    Fine grain buffer:				 Yes
    Fine grain system:				 No
    Atomics:					 No
  Preferred platform atomic alignment:		 0
  Preferred global atomic alignment:		 0
  Preferred local atomic alignment:		 0
  Kernel Preferred work group size multiple:	 32
  Error correction support:			 0
  Unified memory for Host and Device:		 0
  Profiling timer resolution:			 1
  Device endianess:				 Little
  Available:					 Yes
  Compiler available:				 Yes
  Execution capabilities:				 
    Execute OpenCL kernels:			 Yes
    Execute native function:			 No
  Queue on Host properties:				 
    Out-of-Order:				 No
    Profiling :					 Yes
  Queue on Device properties:				 
    Out-of-Order:				 Yes
    Profiling :					 Yes
  Platform ID:					 0x7efe81d1c7c8
  Name:						 gfx1030
  Vendor:					 Advanced Micro Devices, Inc.
  Device OpenCL C version:			 OpenCL C 2.0 
  Driver version:				 3614.0 (HSA1.1,LC)
  Profile:					 FULL_PROFILE
  Version:					 OpenCL 2.0 
  Extensions:					 cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program

Leonidas Arvanitis · Answer 1 · Thu Nov 28 2024 02:05:41 GMT+0800 (China Standard Time)

I have the same issue with an 7900 xtx GPU.
Any progress with this?