Cirrus/Disaggregation for GPUs

Question

Cirrus/Disaggregation for GPUs

jcarreira opened this issue 7 years ago · comments

Opening the discussion for thinking of GPU disaggregation.

Two things come to mind:

Attaching GPUs to uInstances

This allows us to pay for a cheaper instance. However, GPUs are so much more expensive than any instance that the savings here are likely to be negligible.

GPU as a Service model

GPUs are expensive and are exclusively allocated to a single user. However, they are likely to not be fully utilized at all times. This means they could be shared among concurrent users.

We could build a service that provides high levels of GPU virtualization by keeping the dataset remote. Isolation between concurrent tasks could be enforced in software (has been shown to work, e..g., Singularity, but not sure about this adversarial context).

João Carreira · Answer 1 · Fri Jun 02 2017 03:41:54 GMT+0800 (China Standard Time)

Evaluated with Tensorflow on the MNIST dataset (tutorial on tensorflow website) and got 60-70MB/s.

Alexey Tumanov · Answer 2 · Fri Jun 02 2017 04:18:11 GMT+0800 (China Standard Time)

Let's do resnet and alexnet as well

…

On Jun 1, 2017 12:41 PM, "João Carreira" ***@***.***> wrote: Evaluated with Tensorflow on the MNIST dataset (tutorial on tensorflow website) and got 60-70MB/s. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <jcarreira/cirrus#26 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQWFNfCveJ9Ze1ODWYpDv5yftZnVjSmks5r_xQDgaJpZM4NsKcs> .

João Carreira · Answer 3 · Fri Jun 09 2017 02:14:39 GMT+0800 (China Standard Time)

For future reference, code for iterating over datasets in deep learning frameworks:

https://github.com/dmlc/mxnet/search?utf8=%E2%9C%93&q=IIterator&type=Code
tensorflow/tensorflow#7951