ROCM 5.7 Workaround for AMD R7-5700G APU (PyTorch + OpenCL)
CardLin opened this issue · comments
Workaround solution
Install AMD ROCM 5.7 and driver:
sudo apt update
wget https://repo.radeon.com/amdgpu-install/23.20.00.48/ubuntu/jammy/amdgpu-install_5.7.00.48.50700-1_all.deb
sudo apt install ./amdgpu-install_5.7.00.48.50700-1_all.deb
sudo dpkg --add-architecture i386
check and uncomment deb configure:
/etc/apt/sources.list.d/amdgpu-proprietary.list
Install command:
sudo amdgpu-install -y --usecase=graphics,rocm,dkms,opencl,openclsdk,amf
sudo usermod -a -G render,video $LOGNAME
You also can use this command to see what you want to install:
sudo amdgpu-install --list-usecase
Now you can using anaconda or other venv to install pytorch:
conda create --name pytorch2.2.1-rocm5.7-py311 python=3.11
conda activate pytorch2.2.1-rocm5.7-py311
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7
Before you run torch python script or run OpenCL, export these:
export HSA_OVERRIDE_GFX_VERSION=9.0.0
export HCC_SERIALIZE_KERNEL=0x3
export HCC_SERIALIZE_COPY=0x3
export HIP_TRACE_API=0x2
export HSA_ENABLE_INTERRUPT=0
export HSA_ENABLE_SDMA=0
Try clpeak to test:
sudo apt install clpeak
clpeak
Attach any links, screenshots, or additional evidence you think will be helpful.
No response
Hi @CardLin, thanks for creating this Issue. Are you reporting an issue or just want to share information with the rest of the community? Thanks.
Hi @nartmada, At first, I just want to share this workaround to make ROCM and OpenCL work on Cezanne.
I think the user choose Cezanne to run their code is just for developement or testing purpose.
For me, I just want to do test on ROCM PyTorch environment in order to check does it fully optimized.
It will be good if the problem fixed in future release of ROCM.
The performance impact is big with adding these workaround environment flag.
Which cause the performance of enqueueReadBuffer is just 1/3 of enqueueWriteBuffer.
Typically, the read performance will slightly better than write performance.
But I don't have normal clpeak
benchamrk result for my Cezanne to confirm my guess.
So, I think this is very low priority issue. Whether to fix this issue is depend on the workload of ROCM team.
Thanks!!
$ clpeak
Platform: AMD Accelerated Parallel Processing
Device: gfx900:xnack-
Driver version : 3590.0 (HSA1.1,LC) (Linux x64)
Compute units : 8
Clock frequency : 2000 MHz
Global memory bandwidth (GBPS)
float : 43.75
float2 : 44.68
float4 : 45.44
float8 : 45.66
float16 : 47.24
Single-precision compute (GFLOPS)
float : 2012.21
float2 : 2001.83
float4 : 2002.04
float8 : 1985.05
float16 : 1952.34
Half-precision compute (GFLOPS)
half : 2006.41
half2 : 3936.91
half4 : 3882.98
half8 : 3690.92
half16 : 1476.95
Double-precision compute (GFLOPS)
double : 127.40
double2 : 127.32
double4 : 127.26
double8 : 126.94
double16 : 126.35
Integer compute (GIOPS)
int : 405.88
int2 : 405.87
int4 : 378.81
int8 : 404.62
int16 : 403.12
Integer compute Fast 24bit (GIOPS)
int : 1965.63
int2 : 1965.38
int4 : 1966.70
int8 : 1932.46
int16 : 1753.18
Transfer bandwidth (GBPS)
enqueueWriteBuffer : 16.68
enqueueReadBuffer : 5.91
enqueueWriteBuffer non-blocking : 16.60
enqueueReadBuffer non-blocking : 6.01
enqueueMapBuffer(for read) : 438261.94
memcpy from mapped ptr : 5.95
enqueueUnmap(after write) : 933688.56
memcpy to mapped ptr : 16.44
Kernel launch latency : 2.88 us
I am doing a mistake to close the issue. Reopen it...
Hi @CardLin, unfortunately 5700G APU is not a supported HW and the workaround environment flags will have performance impact. Sorry, there will be no fix in future ROCm release for this Cezanne APU issue. Closing the ticket.