[Issue]: hipGetDeviceCount call segfault with 3+ devices
FilipVaverka opened this issue · comments
Problem Description
Following simple HIP code leads to segmentation fault
in libamdhip64.so.6
when following conditions are met simultaneously:
- There are more than 2 visible devices in the system (in my case it means: gfx906, gfx1036 and gfx1100) and
- The binary is linked with any of
roc*
orhip*
libraries (tested rocfft, hipfft, rocblas and hipblas)
#include <iostream>
#include <hip/hip_runtime.h>
int main(int argc, char *argv[])
{
int count = 0;
hipGetDeviceCount(&count);
std::cout << "DEVICES: " << count << std::endl;
return 0;
}
The code is compiled simply as
hipcc -L/opt/rocm/lib -lhipblas test_hip.cpp -o test
Here is (not very useful) back trace:
Thread 1 "test" received signal SIGSEGV, Segmentation fault.
0x00007ffff68aac40 in ?? ()
from /opt/rocm-6.0.2/lib/llvm/bin/../../../lib/libamdhip64.so.6
Missing separate debuginfos, use: zypper install comgr-debuginfo-2.6.0.60002-sles154.115.x86_64 hip-runtime-amd-debuginfo-6.0.32831.60002-sles154.115.x86_64 hsa-rocr-debuginfo-1.12.0.60002-sles154.115.x86_64 libdrm2-debuginfo-2.4.120-1.2.x86_64 libdrm_amdgpu1-debuginfo-2.4.120-1.2.x86_64 libelf1-debuginfo-0.190-1.2.x86_64 libgcc_s1-debuginfo-14.0.1+git8957-1.1.x86_64 libncurses6-debuginfo-6.4.20240210-31.1.x86_64 libnuma1-debuginfo-2.0.18.0.g3871b1c-1.1.x86_64 libstdc++6-debuginfo-14.0.1+git8957-1.2.x86_64 libz1-x86-64-v3-debuginfo-1.3-1.2.x86_64 libzstd1-x86-64-v3-debuginfo-1.5.5-5.2.x86_64
(gdb) bt
#0 0x00007ffff68aac40 in ?? ()
from /opt/rocm-6.0.2/lib/llvm/bin/../../../lib/libamdhip64.so.6
#1 0x00007ffff68abc4d in ?? ()
from /opt/rocm-6.0.2/lib/llvm/bin/../../../lib/libamdhip64.so.6
#2 0x00007ffff6866854 in ?? ()
from /opt/rocm-6.0.2/lib/llvm/bin/../../../lib/libamdhip64.so.6
#3 0x00007ffff69d2730 in ?? ()
from /opt/rocm-6.0.2/lib/llvm/bin/../../../lib/libamdhip64.so.6
#4 0x00007ffff686d9fd in ?? ()
from /opt/rocm-6.0.2/lib/llvm/bin/../../../lib/libamdhip64.so.6
#5 0x00007ffff6097e8f in __pthread_once_slow () from /lib64/libc.so.6
#6 0x00007ffff687a31a in ?? ()
from /opt/rocm-6.0.2/lib/llvm/bin/../../../lib/libamdhip64.so.6
#7 0x00007ffff6884e99 in hipGetDeviceCount ()
from /opt/rocm-6.0.2/lib/llvm/bin/../../../lib/libamdhip64.so.6
#8 0x0000000000201af6 in main ()
Limiting the visible devices (HIP_VISIBLE_DEVICES=A,B
) to any two of my devices resolves the issue. Similarly, not linking any of ROCm math libraries resolves the issue too.
I wasn't able to replicate the issue with multiple (4 or 8) devices of the same architecture (MI100 and MI200).
Operating System
openSUSE Tumbleweed
CPU
AMD Ryzen 9 7950X 16-Core Processor
GPU
AMD Radeon VII, AMD Radeon RX 7900 XTX
ROCm Version
ROCm 6.0.0
ROCm Component
HIP, HIPCC
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: AMD Ryzen 9 7950X 16-Core Processor
Uuid: CPU-XX
Marketing Name: AMD Ryzen 9 7950X 16-Core Processor
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 5881
BDFID: 0
Internal Node ID: 0
Compute Unit: 32
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 64939256(0x3dee4f8) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 64939256(0x3dee4f8) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 64939256(0x3dee4f8) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx1100
Uuid: GPU-9ee3ff99542b5e0c
Marketing Name: AMD Radeon RX 7900 XTX
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 6144(0x1800) KB
L3: 98304(0x18000) KB
Chip ID: 29772(0x744c)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2304
BDFID: 768
Internal Node ID: 1
Compute Unit: 96
SIMDs per CU: 2
Shader Engines: 6
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 550
SDMA engine uCode:: 19
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1100
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*******
Agent 3
*******
Name: gfx906
Uuid: GPU-9f50716172fd5d40
Marketing Name: AMD Radeon VII
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 2
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 8192(0x2000) KB
Chip ID: 26287(0x66af)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1801
BDFID: 1792
Internal Node ID: 2
Compute Unit: 60
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 471
SDMA engine uCode:: 145
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 16760832(0xffc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx906:sramecc+:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*******
Agent 4
*******
Name: gfx1036
Uuid: GPU-XX
Marketing Name: AMD Radeon Graphics
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 3
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 256(0x100) KB
Chip ID: 5710(0x164e)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2200
BDFID: 28928
Internal Node ID: 3
Compute Unit: 2
SIMDs per CU: 2
Shader Engines: 1
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 20
SDMA engine uCode:: 9
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 524288(0x80000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 524288(0x80000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1036
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
Additional Information
No response
Hi @FilipVaverka,
Thanks for reporting the issue. This is a known internal issue which has been fixed in upcoming ROCm 6.1 release.
@FilipVaverka, please re-test with ROCm 6.0.1. Thanks.
Closing the ticket. @FilipVaverka, please re-open if you still see this issue in ROCm 6.1.0. Thanks.
The issue is resolved in ROCm 6.1.0. I apologize for delayed response.