webui crashes after sending prompt
ndrew222 opened this issue · comments
ndrew222 commented
Describe the bug
webui crashes after sending prompt
Is there an existing issue for this?
- I have searched the existing issues
Reproduction
./start_linux.sh
- load a model
- send any prompt
Screenshot
No response
Logs
❯ ./start_linux.sh
14:01:15-573868 INFO Starting Text generation web UI
Running on local URL: http://127.0.0.1:7860
14:01:45-565977 INFO Loading "Phi-3-mini-4k-instruct-fp16.gguf"
14:01:45-598096 INFO llama.cpp weights detected: "models/Phi-3-mini-4k-instruct-fp16.gguf"
llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from models/Phi-3-mini-4k-instruct-fp16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = LLaMA v2
llama_model_loader: - kv 2: llama.vocab_size u32 = 32064
llama_model_loader: - kv 3: llama.context_length u32 = 4096
llama_model_loader: - kv 4: llama.embedding_length u32 = 3072
llama_model_loader: - kv 5: llama.block_count u32 = 32
llama_model_loader: - kv 6: llama.feed_forward_length u32 = 8192
llama_model_loader: - kv 7: llama.rope.dimension_count u32 = 96
llama_model_loader: - kv 8: llama.attention.head_count u32 = 32
llama_model_loader: - kv 9: llama.attention.head_count_kv u32 = 32
llama_model_loader: - kv 10: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 11: llama.rope.freq_base f32 = 10000.000000
llama_model_loader: - kv 12: general.file_type u32 = 1
llama_model_loader: - kv 13: tokenizer.ggml.model str = llama
llama_model_loader: - kv 14: tokenizer.ggml.tokens arr[str,32064] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv 15: tokenizer.ggml.scores arr[f32,32064] = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,32064] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 1
llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 32000
llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 32000
llama_model_loader: - kv 21: tokenizer.ggml.add_bos_token bool = true
llama_model_loader: - kv 22: tokenizer.ggml.add_eos_token bool = false
llama_model_loader: - kv 23: tokenizer.chat_template str = {{ bos_token }}{% for message in mess...
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type f16: 226 tensors
llm_load_vocab: control-looking token: '<|end|>' was not control-type; this is probably a bug in the model. its type will be overridden
llm_load_vocab: control-looking token: '<|endoftext|>' was not control-type; this is probably a bug in the model. its type will be overridden
llm_load_vocab: special tokens cache size = 67
llm_load_vocab: token to piece cache size = 0.1691 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32064
llm_load_print_meta: n_merges = 0
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_embd = 3072
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_rot = 96
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 96
llm_load_print_meta: n_embd_head_v = 96
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: n_embd_k_gqa = 3072
llm_load_print_meta: n_embd_v_gqa = 3072
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 8192
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 4096
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: ssm_dt_b_c_rms = 0
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = F16
llm_load_print_meta: model params = 3.82 B
llm_load_print_meta: model size = 7.12 GiB (16.00 BPW)
llm_load_print_meta: general.name = LLaMA v2
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 32000 '<|endoftext|>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: PAD token = 32000 '<|endoftext|>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_print_meta: EOT token = 32007 '<|end|>'
llm_load_print_meta: EOG token = 32000 '<|endoftext|>'
llm_load_print_meta: EOG token = 32007 '<|end|>'
llm_load_print_meta: max token length = 48
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon RX 6900 XT, compute capability 10.3, VMM: no
llm_load_tensors: ggml ctx size = 0.27 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors: ROCm0 buffer size = 7100.64 MiB
llm_load_tensors: CPU buffer size = 187.88 MiB
.................................................................................................
llama_new_context_with_model: n_ctx = 4096
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: ROCm0 KV buffer size = 1536.00 MiB
llama_new_context_with_model: KV self size = 1536.00 MiB, K (f16): 768.00 MiB, V (f16): 768.00 MiB
llama_new_context_with_model: ROCm_Host output buffer size = 0.12 MiB
llama_new_context_with_model: ROCm0 compute buffer size = 288.00 MiB
llama_new_context_with_model: ROCm_Host compute buffer size = 14.01 MiB
llama_new_context_with_model: graph nodes = 1030
llama_new_context_with_model: graph splits = 2
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | RISCV_VECT = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 |
Model metadata: {'tokenizer.chat_template': "{{ bos_token }}{% for message in messages %}{{'<|' + message['role'] + '|>' + '\n' + message['content'] + '<|end|>\n' }}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>\n' }}{% else %}{{ eos_token }}{% endif %}", 'tokenizer.ggml.add_eos_token': 'false', 'tokenizer.ggml.padding_token_id': '32000', 'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.eos_token_id': '32000', 'tokenizer.ggml.model': 'llama', 'general.architecture': 'llama', 'llama.rope.freq_base': '10000.000000', 'llama.context_length': '4096', 'general.name': 'LLaMA v2', 'llama.vocab_size': '32064', 'general.file_type': '1', 'tokenizer.ggml.add_bos_token': 'true', 'llama.embedding_length': '3072', 'llama.feed_forward_length': '8192', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.dimension_count': '96', 'tokenizer.ggml.bos_token_id': '1', 'llama.attention.head_count': '32', 'llama.block_count': '32', 'llama.attention.head_count_kv': '32'}
Available chat formats from metadata: chat_template.default
Using gguf chat template: {{ bos_token }}{% for message in messages %}{{'<|' + message['role'] + '|>' + '
' + message['content'] + '<|end|>
' }}{% endfor %}{% if add_generation_prompt %}{{ '<|assistant|>
' }}{% else %}{{ eos_token }}{% endif %}
Using chat eos_token: <|endoftext|>
Using chat bos_token: <s>
14:01:47-732817 INFO Loaded "Phi-3-mini-4k-instruct-fp16.gguf" in 2.17 seconds.
14:01:47-733607 INFO LOADER: "llama.cpp"
14:01:47-734197 INFO TRUNCATION LENGTH: 4096
14:01:47-734685 INFO INSTRUCTION TEMPLATE: "Custom (obtained from model metadata)"
ggml_cuda_compute_forward: RMS_NORM failed
CUDA error: invalid device function
current device: 0, in function ggml_cuda_compute_forward at /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml/src/ggml-cuda.cu:2368
err
/home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml/src/ggml-cuda.cu:106: CUDA error
[New LWP 12809]
[New LWP 12789]
[New LWP 12788]
[New LWP 12751]
[New LWP 12716]
[New LWP 12715]
[New LWP 12714]
[New LWP 12713]
[New LWP 12712]
[New LWP 12711]
[New LWP 12710]
[New LWP 12709]
[New LWP 12708]
[New LWP 12707]
[New LWP 12706]
[New LWP 12705]
[New LWP 12704]
[New LWP 12703]
[New LWP 12702]
[New LWP 12701]
[New LWP 12700]
[New LWP 12699]
This GDB supports auto-downloading debuginfo from the following URLs:
<https://debuginfod.fedoraproject.org/>
Enable debuginfod for this session? (y or [n]) [answered N; input not from terminal]
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007fe8071e5c13 in clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6
#0 0x00007fe8071e5c13 in clock_nanosleep@GLIBC_2.2.5 () from /lib64/libc.so.6
#1 0x0000000000645275 in pysleep (timeout=<optimized out>) at /usr/local/src/conda/python-3.11.10/Modules/timemodule.c:2159
warning: 2159 /usr/local/src/conda/python-3.11.10/Modules/timemodule.c: No such file or directory
#2 time_sleep (self=<optimized out>, timeout_obj=<optimized out>) at /usr/local/src/conda/python-3.11.10/Modules/timemodule.c:383
383 in /usr/local/src/conda/python-3.11.10/Modules/timemodule.c
#3 0x0000000000511e46 in _PyEval_EvalFrameDefault (tstate=tstate@entry=0x8a7a38 <_PyRuntime+166328>, frame=<optimized out>, frame@entry=0x7fe8073fa020, throwflag=throwflag@entry=0) at /usr/local/src/conda/python-3.11.10/Python/ceval.c:5020
warning: 5020 /usr/local/src/conda/python-3.11.10/Python/ceval.c: No such file or directory
#4 0x00000000005cc1ea in _PyEval_EvalFrame (throwflag=0, frame=0x7fe8073fa020, tstate=0x8a7a38 <_PyRuntime+166328>) at /usr/local/src/conda/python-3.11.10/Include/internal/pycore_ceval.h:73
warning: 73 /usr/local/src/conda/python-3.11.10/Include/internal/pycore_ceval.h: No such file or directory
#5 _PyEval_Vector (tstate=tstate@entry=0x8a7a38 <_PyRuntime+166328>, func=func@entry=0x7fe8070987c0, locals=locals@entry=0x7fe8070f24c0, args=args@entry=0x0, argcount=argcount@entry=0, kwnames=kwnames@entry=0x0) at /usr/local/src/conda/python-3.11.10/Python/ceval.c:6434
warning: 6434 /usr/local/src/conda/python-3.11.10/Python/ceval.c: No such file or directory
#6 0x00000000005cb8bf in PyEval_EvalCode (co=co@entry=0xbac8130, globals=globals@entry=0x7fe8070f24c0, locals=locals@entry=0x7fe8070f24c0) at /usr/local/src/conda/python-3.11.10/Python/ceval.c:1148
1148 in /usr/local/src/conda/python-3.11.10/Python/ceval.c
#7 0x00000000005ec9e7 in run_eval_code_obj (tstate=tstate@entry=0x8a7a38 <_PyRuntime+166328>, co=co@entry=0xbac8130, globals=globals@entry=0x7fe8070f24c0, locals=locals@entry=0x7fe8070f24c0) at /usr/local/src/conda/python-3.11.10/Python/pythonrun.c:1741
warning: 1741 /usr/local/src/conda/python-3.11.10/Python/pythonrun.c: No such file or directory
#8 0x00000000005e8580 in run_mod (mod=mod@entry=0xbae9900, filename=filename@entry=0x7fe80702d300, globals=globals@entry=0x7fe8070f24c0, locals=locals@entry=0x7fe8070f24c0, flags=flags@entry=0x7fff954f7af8, arena=arena@entry=0x7fe80701b630) at /usr/local/src/conda/python-3.11.10/Python/pythonrun.c:1762
1762 in /usr/local/src/conda/python-3.11.10/Python/pythonrun.c
#9 0x00000000005fd4d2 in pyrun_file (fp=fp@entry=0xba23080, filename=filename@entry=0x7fe80702d300, start=start@entry=257, globals=globals@entry=0x7fe8070f24c0, locals=locals@entry=0x7fe8070f24c0, closeit=closeit@entry=1, flags=0x7fff954f7af8) at /usr/local/src/conda/python-3.11.10/Python/pythonrun.c:1657
1657 in /usr/local/src/conda/python-3.11.10/Python/pythonrun.c
#10 0x00000000005fc89f in _PyRun_SimpleFileObject (fp=0xba23080, filename=0x7fe80702d300, closeit=1, flags=0x7fff954f7af8) at /usr/local/src/conda/python-3.11.10/Python/pythonrun.c:440
440 in /usr/local/src/conda/python-3.11.10/Python/pythonrun.c
#11 0x00000000005fc5c3 in _PyRun_AnyFileObject (fp=0xba23080, filename=filename@entry=0x7fe80702d300, closeit=closeit@entry=1, flags=flags@entry=0x7fff954f7af8) at /usr/local/src/conda/python-3.11.10/Python/pythonrun.c:79
79 in /usr/local/src/conda/python-3.11.10/Python/pythonrun.c
#12 0x00000000005f723e in pymain_run_file_obj (skip_source_first_line=0, filename=0x7fe80702d300, program_name=0x7fe8070f26b0) at /usr/local/src/conda/python-3.11.10/Modules/main.c:360
warning: 360 /usr/local/src/conda/python-3.11.10/Modules/main.c: No such file or directory
#13 pymain_run_file (config=0x88da80 <_PyRuntime+59904>) at /usr/local/src/conda/python-3.11.10/Modules/main.c:379
379 in /usr/local/src/conda/python-3.11.10/Modules/main.c
#14 pymain_run_python (exitcode=0x7fff954f7af0) at /usr/local/src/conda/python-3.11.10/Modules/main.c:605
605 in /usr/local/src/conda/python-3.11.10/Modules/main.c
#15 Py_RunMain () at /usr/local/src/conda/python-3.11.10/Modules/main.c:684
684 in /usr/local/src/conda/python-3.11.10/Modules/main.c
#16 0x00000000005bbf89 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at /usr/local/src/conda/python-3.11.10/Modules/main.c:738
738 in /usr/local/src/conda/python-3.11.10/Modules/main.c
#17 0x00007fe80712c088 in __libc_start_call_main () from /lib64/libc.so.6
#18 0x00007fe80712c14b in __libc_start_main_impl () from /lib64/libc.so.6
#19 0x00000000005bbdd3 in _start ()
[Inferior 1 (process 12681) detached]
System Info
fedora 6.11.5-200.fc40.x86_64
Fedora 40
CPU: AMD Ryzen 7 5800X (16) @ 4.85 GHz
GPU: AMD Radeon RX 6900 XT [Discrete]
rocminfo
========
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
Runtime Ext Version: 1.4
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 7 5800X 8-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 7 5800X 8-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 4851
BDFID: 0
Internal Node ID: 0
Compute Unit: 16
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 40969404(0x27124bc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 40969404(0x27124bc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 40969404(0x27124bc) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1030
Uuid: GPU-762c9ecf002e0002
Marketing Name: AMD Radeon RX 6900 XT
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 4096(0x1000) KB
L3: 131072(0x20000) KB
Chip ID: 29615(0x73af)
ASIC Revision: 1(0x1)
Cacheline Size: 128(0x80)
Max Clock Freq. (MHz): 2720
BDFID: 2816
Internal Node ID: 1
Compute Unit: 80
SIMDs per CU: 2
Shader Engines: 4
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 120
SDMA engine uCode:: 83
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1030
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
rocm-clinfo
===========
Number of platforms: 1
Platform Profile: FULL_PROFILE
Platform Version: OpenCL 2.1 AMD-APP (3614.0)
Platform Name: AMD Accelerated Parallel Processing
Platform Vendor: Advanced Micro Devices, Inc.
Platform Extensions: cl_khr_icd cl_amd_event_callback
Platform Name: AMD Accelerated Parallel Processing
Number of devices: 1
Device Type: CL_DEVICE_TYPE_GPU
Vendor ID: 1002h
Board name: AMD Radeon RX 6900 XT
Device Topology: PCI[ B#11, D#0, F#0 ]
Max compute units: 40
Max work items dimensions: 3
Max work items[0]: 1024
Max work items[1]: 1024
Max work items[2]: 1024
Max work group size: 256
Preferred vector width char: 4
Preferred vector width short: 2
Preferred vector width int: 1
Preferred vector width long: 1
Preferred vector width float: 1
Preferred vector width double: 1
Native vector width char: 4
Native vector width short: 2
Native vector width int: 1
Native vector width long: 1
Native vector width float: 1
Native vector width double: 1
Max clock frequency: 2720Mhz
Address bits: 64
Max memory allocation: 14588628168
Image support: Yes
Max number of images read arguments: 128
Max number of images write arguments: 8
Max image 2D width: 16384
Max image 2D height: 16384
Max image 3D width: 16384
Max image 3D height: 16384
Max image 3D depth: 8192
Max samplers within kernel: 16
Max size of kernel argument: 1024
Alignment (bits) of base address: 1024
Minimum alignment (bytes) for any datatype: 128
Single precision floating point capability
Denorms: Yes
Quiet NaNs: Yes
Round to nearest even: Yes
Round to zero: Yes
Round to +ve and infinity: Yes
IEEE754-2008 fused multiply-add: Yes
Cache type: Read/Write
Cache line size: 128
Cache size: 16384
Global memory size: 17163091968
Constant buffer size: 14588628168
Max number of constant args: 8
Local memory type: Scratchpad
Local memory size: 65536
Max pipe arguments: 16
Max pipe active reservations: 16
Max pipe packet size: 1703726280
Max global variable size: 14588628168
Max global variable preferred total size: 17163091968
Max read/write image args: 64
Max on device events: 1024
Queue on device max size: 8388608
Max on device queues: 1
Queue on device preferred size: 262144
SVM capabilities:
Coarse grain buffer: Yes
Fine grain buffer: Yes
Fine grain system: No
Atomics: No
Preferred platform atomic alignment: 0
Preferred global atomic alignment: 0
Preferred local atomic alignment: 0
Kernel Preferred work group size multiple: 32
Error correction support: 0
Unified memory for Host and Device: 0
Profiling timer resolution: 1
Device endianess: Little
Available: Yes
Compiler available: Yes
Execution capabilities:
Execute OpenCL kernels: Yes
Execute native function: No
Queue on Host properties:
Out-of-Order: No
Profiling : Yes
Queue on Device properties:
Out-of-Order: Yes
Profiling : Yes
Platform ID: 0x7efe81d1c7c8
Name: gfx1030
Vendor: Advanced Micro Devices, Inc.
Device OpenCL C version: OpenCL C 2.0
Driver version: 3614.0 (HSA1.1,LC)
Profile: FULL_PROFILE
Version: OpenCL 2.0
Extensions: cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program
Leonidas Arvanitis commented
I have the same issue with an 7900 xtx GPU.
Any progress with this?