triton-lang / triton

Hello.

I ran into an issue trying to compile a pytorch model. I have a node with multiple GPUs and I was using device cuda:1. There was nothing running on the GPU and the VRAM was not utilized. My model failed to compile due to a CUDA OOM error and threw an error on this line.

triton/python/triton/testing.py

Lines 114 to 117 in a5b3783

    
           if fast_flush: 
        
               cache = torch.empty(int(cache_size // 4), dtype=torch.int, device='cuda') 
        
           else: 
        
               cache = torch.empty(int(cache_size), dtype=torch.int8, device='cuda')

It looks like the code does not use the device on which the model is, but always uses cuda, which defaults to cuda:0 device. In my case, devive cuda:0 was being used and its memory was totally full, which led to an OOM error.

It would be good if the testing code here used the device used by the model or you could specify it in a way.

Best regards,
Andrej

use torch.cuda.set_device to change the default cuda device or parse 'cuda:1' instead of cuda

	if fast_flush:
	cache = torch.empty(int(cache_size // 4), dtype=torch.int, device='cuda')
	else:
	cache = torch.empty(int(cache_size), dtype=torch.int8, device='cuda')

Cannot specify which device to use