Why MPS only support 6 GPU

Question

Why MPS only support 6 GPU

valiantljk opened this issue 3 years ago · comments

Hi,
In the Salus paper, regarding inference, it says:

Salus needs only 1 GPU, achieving 42 utilization improvement, 
while the average latency overhead is less than 5ms. 
For comparison, MPS needs 6 GPUs.

Could you explain why MPS needs 6 GPUS, what is the limitation on GPU that stops from running more instances of inference tasks?

Aetf · Answer 1 · Mon Jun 07 2021 01:56:46 GMT+0800 (China Standard Time)

For MPS, you need to ensure that the summation for all persistent (model and framework-internal) and all ephemeral memory doesn't exceed the GPU memory capacity.

While for Salus, the safety condition is relaxed to the summation for all persistent and the max of ephemeral memory doesn't exceed the capacity.

It's explained in detail in section 3.3.2 in the paper.