ROCm / ROCm

AMD ROCm™ Software - GitHub Home

Home Page:https://rocm.docs.amd.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ROCM 5.7 Workaround for AMD R7-5700G APU (PyTorch + OpenCL)

CardLin opened this issue · comments

Workaround solution

Install AMD ROCM 5.7 and driver:

sudo apt update
wget https://repo.radeon.com/amdgpu-install/23.20.00.48/ubuntu/jammy/amdgpu-install_5.7.00.48.50700-1_all.deb
sudo apt install ./amdgpu-install_5.7.00.48.50700-1_all.deb
sudo dpkg --add-architecture i386

check and uncomment deb configure:
/etc/apt/sources.list.d/amdgpu-proprietary.list

Install command:

sudo amdgpu-install -y --usecase=graphics,rocm,dkms,opencl,openclsdk,amf
sudo usermod -a -G render,video $LOGNAME

You also can use this command to see what you want to install:
sudo amdgpu-install --list-usecase

Now you can using anaconda or other venv to install pytorch:

conda create --name pytorch2.2.1-rocm5.7-py311 python=3.11
conda activate pytorch2.2.1-rocm5.7-py311
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7

Before you run torch python script or run OpenCL, export these:

export HSA_OVERRIDE_GFX_VERSION=9.0.0
export HCC_SERIALIZE_KERNEL=0x3
export HCC_SERIALIZE_COPY=0x3
export HIP_TRACE_API=0x2
export HSA_ENABLE_INTERRUPT=0
export HSA_ENABLE_SDMA=0

Try clpeak to test:

sudo apt install clpeak
clpeak

Attach any links, screenshots, or additional evidence you think will be helpful.

No response

Hi @CardLin, thanks for creating this Issue. Are you reporting an issue or just want to share information with the rest of the community? Thanks.

Hi @nartmada, At first, I just want to share this workaround to make ROCM and OpenCL work on Cezanne.

I think the user choose Cezanne to run their code is just for developement or testing purpose.

For me, I just want to do test on ROCM PyTorch environment in order to check does it fully optimized.

It will be good if the problem fixed in future release of ROCM.

The performance impact is big with adding these workaround environment flag.

Which cause the performance of enqueueReadBuffer is just 1/3 of enqueueWriteBuffer.

Typically, the read performance will slightly better than write performance.

But I don't have normal clpeak benchamrk result for my Cezanne to confirm my guess.

So, I think this is very low priority issue. Whether to fix this issue is depend on the workload of ROCM team.

Thanks!!

$ clpeak
 
Platform: AMD Accelerated Parallel Processing
  Device: gfx900:xnack-
    Driver version  : 3590.0 (HSA1.1,LC) (Linux x64)
    Compute units   : 8
    Clock frequency : 2000 MHz

    Global memory bandwidth (GBPS)
      float   : 43.75
      float2  : 44.68
      float4  : 45.44
      float8  : 45.66
      float16 : 47.24

    Single-precision compute (GFLOPS)
      float   : 2012.21
      float2  : 2001.83
      float4  : 2002.04
      float8  : 1985.05
      float16 : 1952.34

    Half-precision compute (GFLOPS)
      half   : 2006.41
      half2  : 3936.91
      half4  : 3882.98
      half8  : 3690.92
      half16 : 1476.95

    Double-precision compute (GFLOPS)
      double   : 127.40
      double2  : 127.32
      double4  : 127.26
      double8  : 126.94
      double16 : 126.35

    Integer compute (GIOPS)
      int   : 405.88
      int2  : 405.87
      int4  : 378.81
      int8  : 404.62
      int16 : 403.12

    Integer compute Fast 24bit (GIOPS)
      int   : 1965.63
      int2  : 1965.38
      int4  : 1966.70
      int8  : 1932.46
      int16 : 1753.18

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer              : 16.68
      enqueueReadBuffer               : 5.91
      enqueueWriteBuffer non-blocking : 16.60
      enqueueReadBuffer non-blocking  : 6.01
      enqueueMapBuffer(for read)      : 438261.94
        memcpy from mapped ptr        : 5.95
      enqueueUnmap(after write)       : 933688.56
        memcpy to mapped ptr          : 16.44

    Kernel launch latency : 2.88 us

I am doing a mistake to close the issue. Reopen it...

Hi @CardLin, unfortunately 5700G APU is not a supported HW and the workaround environment flags will have performance impact. Sorry, there will be no fix in future ROCm release for this Cezanne APU issue. Closing the ticket.