Tensorflow consumes all GPU memory

Question

Tensorflow consumes all GPU memory

zspasztori opened this issue 7 years ago · comments

Hi,

For some reason I can only run 1 worker, because tensorflow automatically assigns all free memory to the first one. Also it seems like the object detection is not the bottleneck, because if I change the model to a more complex one the fps does not decrease.

Zsolt Pasztori · Answer 1 · Tue Jun 27 2017 02:58:16 GMT+0800 (China Standard Time)

Update worker to fixes it:
def worker(input_q, output_q):
# Load a (frozen) Tensorflow model into memory.
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')

    sess = tf.Session(graph=detection_graph,config=config)

fps = FPS().start()
while True:
    fps.update()
    frame = input_q.get()
    output_q.put(detect_objects(frame, sess, detection_graph))

fps.stop()
sess.close()

Dat Tran · Answer 2 · Tue Jun 27 2017 03:28:30 GMT+0800 (China Standard Time)

Cool. Yeh I intentionally didn't use GPU as I assume that the overhead should be bigger at start for prediction. There is also an interesting thread here, that discusses this. Please note that fps is an estimate and also it uses the elapsed time since the beginning of the Python app which means if you have a lot of workers or complex model, fps will be low due to this "bias". I should have taken this into account but I was lazy. If you run it for longer the fps rate should be better. This is what I observed.

Zsolt Pasztori · Answer 3 · Tue Jun 27 2017 03:34:06 GMT+0800 (China Standard Time)

Yeah, this solves it tough. The GPU memory is not the bottleneck tough, neither image capturing. Somehow fps is stuck at 10 regardless of model and input fps. It never goes under for me.

Deleted user · Answer 4 · Tue Jun 27 2017 08:31:04 GMT+0800 (China Standard Time)

The object detection API uses feed_dict which has significantly worse performance since it has to copy data from python -> C++ -> GPU memory. The recommended alternative is to use Tensorflow Serving or queues which avoids expensive feed_dict approach.

https://blog.metaflow.fr/tensorflow-how-to-optimise-your-input-pipeline-with-queues-and-multi-threading-e7c3874157e0

Dat Tran · Answer 5 · Tue Jun 27 2017 14:57:34 GMT+0800 (China Standard Time)

@AKSHAYUBHAT Good point! Actually I've read this article from Morgan Giraud before but forgot it with the feed_dict issue. Thanks for mentioning it. It's definitely better to avoid feed_dict.

Ludi Rehak · Answer 6 · Mon Jul 17 2017 08:11:55 GMT+0800 (China Standard Time)

It would be great if someone replaced the feed_dict with queues in this demo. Might see a nice improvement in fps.