First call to sess.run() at inference time is slow
thomasweng15 opened this issue · comments
Hi, have you encountered an issue where the first call to sess.run() in contact_grasp_estimator.py is slow? I am running the inference example in the readme, and when I time sess.run() the first call takes much longer than subsequent calls:
Run inference 1162.3998165130615
Preprocess pc for inference 0.0007269382476806641
Run inference 0.2754530906677246
Preprocess pc for inference 0.0006759166717529297
I found this thread on what seems to be a similar issue but the simple resolutions have not worked, and I have not tried compiling tensorflow from source yet. I am running on a GTX 3090 with CUDA 11.1, tensorflow-gpu==2.2. Have you encountered this issue before? Thanks for your help.
The quality for the grasps is also much worse than expected:
I have tried recompiling the pointnet tf ops using this script https://github.com/NVlabs/contact_graspnet/blob/main/compile_pointnet_tfops.sh
but the problem persists. I did the same setup on another, brand new machine, also with a GTX 3090 and with CUDA 11.2, and encountered the same problem and performance.
Regarding inference: on the desktops I have tried it, it may take 2-3 seconds on the first inference but not 1162 seconds... not sure why it takes longer on your machine.
Regarding problem in inference: Some thing is terribly wrong in here. I assume you already checked git status
and there is nothing changed in the repo. I have tested this code with cuda 11.1 on multiple machines with no problem. Can you try cuda 11.1 and tensorflow-gpu 2.2.0? In other projects with custom cuda ops (in pytorch), I have seen discrepancy between cuda versions (I know it's surprising, but I have seen it).
@thomasweng15 let me know if setting up cuda 11.1 fixes the issue for you.
I switched to cuda 11.1 and ran it with tensorflow-gpu 2.2, but had the same issue. I then upgraded to tensorflow-gpu 2.5, reasoning that the 3080 and 3090 GPUs were too new for previous tensorflow-gpu versions, and also knowing that my labmate also used 2.5. I had to recompile the pointnet tf_ops, and install cudnn 8.1 and cudatoolkit 11.0 from conda-forge. The problem is now fixed: the first inference runs in 2 seconds, and the predictions look much better:
So the takeaway is that newer 30xx GPUs should upgrade to to tensorflow-gpu=2.5.
@thomasweng15
hi, Do you have a yml file about the new version environment tf2.5 , cuda 11.0 and cudnn8.1 ?