AMD ROCm: 8x22B Model Causes 100% GPU Utilization Stall
Trat8547 opened this issue · comments
Trat8547 commented
When running WizardLM-2-8x22B the model loads into VRAM but then freezes at 100% GPU usage when attempting to process kv cache. The power draw is 100W/300W and stays like this until terminating the server.
Oddly Llama-3-70B works perfectly fine for the setup below, but fails for other kernels and ROCm versions.
- OS: Ubuntu 22.04.4
- Linux Kernel: 5.19.0-50-generic
- Virtualization: Xen Hypervisor
- GPU: x2 MI100
- ROCm: 6.0.0
- Llama.cpp/Server Version: Any
When switching to kernel 6.5 with ROCm 6.0 or 6.1 neither Llama-3-70B or WizardLM-2-8x22B work causing the 100% stall bug.
- iommu=pt has no effect
- GPU_MAX_HW_QUEUES=1 has no effect for any ROCm version or kernel