ROCm / ROCm

AMD ROCm™ Software - GitHub Home

Home Page:https://rocm.docs.amd.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Issue]: python -c "import torch;print(torch.cuda.is_available())" returns False

Looong01 opened this issue · comments

Problem Description

I restrictly Follow the steps here to install amdgpu driver and rocm-6.1.0 and hip sdk.

rocminfo and rocm-smi and amd-smi run successfully.

But when I try to run pytorch with conda environment, it cannot detect any GPUs.

I also tried docker from rocm/pytorch in docker hub. and it also failed.

I also tried to install all the driver and rocm and hip by using AMDGPU installer, an it also failed.

(PyTorch) loong@home:~$ python -c "import torch;print(torch.cuda.is_available())"
False

Operating System

Ubuntu 22.04.4 LTS (Jammy Jellyfish)

CPU

Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz

GPU

AMD Radeon RX 7900 XTX

ROCm Version

ROCm 6.1.0

ROCm Component

No response

Steps to Reproduce

  1. Follow the steps here o install amdgpu driver and rocm-6.1.0 and hip sdk.
  2. conda create -n PyTorch python=3.10 -y
  3. conda activate PyTorch
  4. wget https://repo.radeon.com/rocm/manylinux/rocm-rel-6.1/torch-2.1.2%2Brocm6.1-cp310-cp310-linux_x86_64.whl
  5. wget https://repo.radeon.com/rocm/manylinux/rocm-rel-6.1/torchvision-0.16.1%2Brocm6.1-cp310-cp310-linux_x86_64.whl
  6. wget https://repo.radeon.com/rocm/manylinux/rocm-rel-6.1/pytorch_triton_rocm-2.1.0%2Brocm6.1.4d510c3a44-cp310-cp310-linux_x86_64.whl
  7. pip install --force-reinstall ./torch-2.1.2%2Brocm6.1-cp310-cp310-linux_x86_64.whl ./torchvision-0.16.1%2Brocm6.1-cp310-cp310-linux_x86_64.whl ./pytorch_triton_rocm-2.1.0%2Brocm6.1.4d510c3a44-cp310-cp310-linux_x86_64.whl
  8. python -c "import torch;print(torch.cuda.is_available())"

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

ROCk module version 6.7.0 is loaded

HSA System Attributes

Runtime Version: 1.13
Runtime Ext Version: 1.4
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES

==========
HSA Agents


Agent 1


Name: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
Uuid: CPU-XX
Marketing Name: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 4700
BDFID: 0
Internal Node ID: 0
Compute Unit: 8
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 65781560(0x3ebbf38) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 65781560(0x3ebbf38) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 65781560(0x3ebbf38) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:


Agent 2


Name: gfx1100
Uuid: GPU-85631fd855c9cea1
Marketing Name: Radeon RX 7900 XTX
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 32(0x20) KB
L2: 6144(0x1800) KB
L3: 98304(0x18000) KB
Chip ID: 29772(0x744c)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 2482
BDFID: 768
Internal Node ID: 1
Compute Unit: 96
SIMDs per CU: 2
Shader Engines: 6
Shader Arrs. per Eng.: 2
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 32(0x20)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 32(0x20)
Max Work-item Per CU: 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 92
SDMA engine uCode:: 20
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 25149440(0x17fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Recommended Granule:2048KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Recommended Granule:0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx1100
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***

Additional Information

OS:
NAME="Ubuntu"
VERSION="22.04.4 LTS (Jammy Jellyfish)"

CPU:
model name : Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz

GPU:
Name: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
Marketing Name: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz
Name: gfx1100
Marketing Name: Radeon RX 7900 XTX
Name: amdgcn-amd-amdhsa--gfx1100

This may help ROCm/pytorch#1398 (comment)

No help.

I got the same problem.
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0
does not work.

pip3 install --pre --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.0
works! Maybe you need to --force-reinstall your package.

@Looong01 Have you tried --force-reinstall to see if it works? Thanks!

@Looong01 Have you tried --force-reinstall to see if it works? Thanks!

I tried and no help.

I got the same problem. pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0 does not work.

pip3 install --pre --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.0 works! Maybe you need to --force-reinstall your package.

I tried and no help. Thanks anyway

Btw, I have 2 user accounts(A and B) in my system(Ubuntu 22.04.4). All of those have sudo Permissions. A runs this successfully and returns True, but B doesn't.

And I think this may help:

$ getent group
root:x:0:
daemon:x:1:
bin:x:2:
sys:x:3:
adm:x:4:syslog,A
tty:x:5:
disk:x:6:
lp:x:7:
mail:x:8:
news:x:9:
uucp:x:10:
man:x:12:
proxy:x:13:
kmem:x:15:
dialout:x:20:
fax:x:21:
voice:x:22:
cdrom:x:24:A
floppy:x:25:
tape:x:26:
sudo:x:27:A,B
audio:x:29:
dip:x:30:A
www-data:x:33:
backup:x:34:
operator:x:37:
list:x:38:
irc:x:39:
src:x:40:
gnats:x:41:
shadow:x:42:
utmp:x:43:
video:x:44:A
sasl:x:45:
plugdev:x:46:A
staff:x:50:
games:x:60:
users:x:100:
nogroup:x:65534:
systemd-journal:x:101:
systemd-network:x:102:
systemd-resolve:x:103:
messagebus:x:104:
systemd-timesync:x:105:
input:x:106:
sgx:x:107:
kvm:x:108:
render:x:109:A
lxd:x:110:A
_ssh:x:111:
crontab:x:112:
syslog:x:113:
uuidd:x:114:
tcpdump:x:115:
tss:x:116:
landscape:x:117:
fwupd-refresh:x:118:
A:x:1000:
B:x:1001:

It looks to me like user B is not in the render group. At some point the user needed to be in the render and/or video group. I would suspect that this is the reason why user B is not working while user A is.

For pytorch I only use the instructions found on their page (https://pytorch.org/get-started/locally/), but I don't have your GPU.

sudo usermod -a -G render,video B or if logged in as user B: sudo usermod -a -G render,video $LOGNAME