SymbioticLab / Salus

Fine-grained GPU sharing primitives

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why MPS only support 6 GPU

valiantljk opened this issue · comments

Hi,
In the Salus paper, regarding inference, it says:

Salus needs only 1 GPU, achieving 42 utilization improvement, 
while the average latency overhead is less than 5ms. 
For comparison, MPS needs 6 GPUs.

Could you explain why MPS needs 6 GPUS, what is the limitation on GPU that stops from running more instances of inference tasks?

commented

For MPS, you need to ensure that the summation for all persistent (model and framework-internal) and all ephemeral memory doesn't exceed the GPU memory capacity.

While for Salus, the safety condition is relaxed to the summation for all persistent and the max of ephemeral memory doesn't exceed the capacity.

It's explained in detail in section 3.3.2 in the paper.