datitran / object_detector_app

Real-Time Object Recognition App with Tensorflow and OpenCV

Home Page:https://medium.com/towards-data-science/building-a-real-time-object-recognition-app-with-tensorflow-and-opencv-b7a2b4ebdc32

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Tensorflow consumes all GPU memory

zspasztori opened this issue · comments

Hi,

For some reason I can only run 1 worker, because tensorflow automatically assigns all free memory to the first one. Also it seems like the object detection is not the bottleneck, because if I change the model to a more complex one the fps does not decrease.

Update worker to fixes it:
def worker(input_q, output_q):
# Load a (frozen) Tensorflow model into memory.
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')

    sess = tf.Session(graph=detection_graph,config=config)

fps = FPS().start()
while True:
    fps.update()
    frame = input_q.get()
    output_q.put(detect_objects(frame, sess, detection_graph))

fps.stop()
sess.close()

Cool. Yeh I intentionally didn't use GPU as I assume that the overhead should be bigger at start for prediction. There is also an interesting thread here, that discusses this. Please note that fps is an estimate and also it uses the elapsed time since the beginning of the Python app which means if you have a lot of workers or complex model, fps will be low due to this "bias". I should have taken this into account but I was lazy. If you run it for longer the fps rate should be better. This is what I observed.

Yeah, this solves it tough. The GPU memory is not the bottleneck tough, neither image capturing. Somehow fps is stuck at 10 regardless of model and input fps. It never goes under for me.

The object detection API uses feed_dict which has significantly worse performance since it has to copy data from python -> C++ -> GPU memory. The recommended alternative is to use Tensorflow Serving or queues which avoids expensive feed_dict approach.

https://blog.metaflow.fr/tensorflow-how-to-optimise-your-input-pipeline-with-queues-and-multi-threading-e7c3874157e0

@AKSHAYUBHAT Good point! Actually I've read this article from Morgan Giraud before but forgot it with the feed_dict issue. Thanks for mentioning it. It's definitely better to avoid feed_dict.

It would be great if someone replaced the feed_dict with queues in this demo. Might see a nice improvement in fps.