Cirrus/Disaggregation for GPUs
jcarreira opened this issue · comments
Opening the discussion for thinking of GPU disaggregation.
Two things come to mind:
- Attaching GPUs to uInstances
This allows us to pay for a cheaper instance. However, GPUs are so much more expensive than any instance that the savings here are likely to be negligible.
- GPU as a Service model
GPUs are expensive and are exclusively allocated to a single user. However, they are likely to not be fully utilized at all times. This means they could be shared among concurrent users.
We could build a service that provides high levels of GPU virtualization by keeping the dataset remote. Isolation between concurrent tasks could be enforced in software (has been shown to work, e..g., Singularity, but not sure about this adversarial context).
Evaluated with Tensorflow on the MNIST dataset (tutorial on tensorflow website) and got 60-70MB/s.
For future reference, code for iterating over datasets in deep learning frameworks:
https://github.com/dmlc/mxnet/search?utf8=%E2%9C%93&q=IIterator&type=Code
tensorflow/tensorflow#7951