AMD ROCm: 8x22B Model Causes 100% GPU Utilization Stall

Question

AMD ROCm: 8x22B Model Causes 100% GPU Utilization Stall

Trat8547 opened this issue 22 days ago · comments

When running WizardLM-2-8x22B the model loads into VRAM but then freezes at 100% GPU usage when attempting to process kv cache. The power draw is 100W/300W and stays like this until terminating the server.

Oddly Llama-3-70B works perfectly fine for the setup below, but fails for other kernels and ROCm versions.

OS: Ubuntu 22.04.4
Linux Kernel: 5.19.0-50-generic
Virtualization: Xen Hypervisor
GPU: x2 MI100
ROCm: 6.0.0
Llama.cpp/Server Version: Any

When switching to kernel 6.5 with ROCm 6.0 or 6.1 neither Llama-3-70B or WizardLM-2-8x22B work causing the 100% stall bug.

iommu=pt has no effect
GPU_MAX_HW_QUEUES=1 has no effect for any ROCm version or kernel