jcarreira / cirrus-kv

High-performance key-value store

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cirrus/Disaggregation for GPUs

jcarreira opened this issue · comments

Opening the discussion for thinking of GPU disaggregation.

Two things come to mind:

  1. Attaching GPUs to uInstances

This allows us to pay for a cheaper instance. However, GPUs are so much more expensive than any instance that the savings here are likely to be negligible.

  1. GPU as a Service model

GPUs are expensive and are exclusively allocated to a single user. However, they are likely to not be fully utilized at all times. This means they could be shared among concurrent users.

We could build a service that provides high levels of GPU virtualization by keeping the dataset remote. Isolation between concurrent tasks could be enforced in software (has been shown to work, e..g., Singularity, but not sure about this adversarial context).

Evaluated with Tensorflow on the MNIST dataset (tutorial on tensorflow website) and got 60-70MB/s.

For future reference, code for iterating over datasets in deep learning frameworks:

https://github.com/dmlc/mxnet/search?utf8=%E2%9C%93&q=IIterator&type=Code
tensorflow/tensorflow#7951