First call to sess.run() at inference time is slow

Question

First call to sess.run() at inference time is slow

thomasweng15 opened this issue 3 years ago · comments

Hi, have you encountered an issue where the first call to sess.run() in contact_grasp_estimator.py is slow? I am running the inference example in the readme, and when I time sess.run() the first call takes much longer than subsequent calls:

Run inference 1162.3998165130615
Preprocess pc for inference 0.0007269382476806641
Run inference 0.2754530906677246
Preprocess pc for inference 0.0006759166717529297

I found this thread on what seems to be a similar issue but the simple resolutions have not worked, and I have not tried compiling tensorflow from source yet. I am running on a GTX 3090 with CUDA 11.1, tensorflow-gpu==2.2. Have you encountered this issue before? Thanks for your help.

Thomas Weng · Answer 1 · Tue Oct 19 2021 23:43:19 GMT+0800 (China Standard Time)

The quality for the grasps is also much worse than expected:

I have tried recompiling the pointnet tf ops using this script https://github.com/NVlabs/contact_graspnet/blob/main/compile_pointnet_tfops.sh but the problem persists. I did the same setup on another, brand new machine, also with a GTX 3090 and with CUDA 11.2, and encountered the same problem and performance.

Arsalan Mousavian · Answer 2 · Tue Oct 19 2021 23:57:36 GMT+0800 (China Standard Time)

Regarding inference: on the desktops I have tried it, it may take 2-3 seconds on the first inference but not 1162 seconds... not sure why it takes longer on your machine.

Regarding problem in inference: Some thing is terribly wrong in here. I assume you already checked git status and there is nothing changed in the repo. I have tested this code with cuda 11.1 on multiple machines with no problem. Can you try cuda 11.1 and tensorflow-gpu 2.2.0? In other projects with custom cuda ops (in pytorch), I have seen discrepancy between cuda versions (I know it's surprising, but I have seen it).

Arsalan Mousavian · Answer 3 · Wed Oct 20 2021 12:28:27 GMT+0800 (China Standard Time)

@thomasweng15 let me know if setting up cuda 11.1 fixes the issue for you.

Thomas Weng · Answer 4 · Thu Oct 21 2021 05:16:57 GMT+0800 (China Standard Time)

I switched to cuda 11.1 and ran it with tensorflow-gpu 2.2, but had the same issue. I then upgraded to tensorflow-gpu 2.5, reasoning that the 3080 and 3090 GPUs were too new for previous tensorflow-gpu versions, and also knowing that my labmate also used 2.5. I had to recompile the pointnet tf_ops, and install cudnn 8.1 and cudatoolkit 11.0 from conda-forge. The problem is now fixed: the first inference runs in 2 seconds, and the predictions look much better:

So the takeaway is that newer 30xx GPUs should upgrade to to tensorflow-gpu=2.5.

Xiaolin Lin · Answer 5 · Fri May 27 2022 23:06:37 GMT+0800 (China Standard Time)

@thomasweng15
hi, Do you have a yml file about the new version environment tf2.5 , cuda 11.0 and cudnn8.1 ?