[ROS2] SegmentFromPoint should run `set_image` before the action call

Question

[ROS2] SegmentFromPoint should run `set_image` before the action call

amalnanavati opened this issue 9 months ago · comments

NVIDIA uses adaptive clocking on its GPUs, which adjusts GPU clock based on the amount of power being supplied to the computer. As a result, when connected to wall power, the GPU operates at a max Graphics Clock rate of 2100MHz and Memory Transfer Rate of 14002MHz. When connected to the wheelchair power, it drops to 2100MHz and 1620MHz, respectively, and when running on battery power it drops to 420MHz and 810MHz. (Open NVIDIA X Server Settings to the PowerMizer tab to see these values.) Further, when connected to the wheelchair power (via USB-C), Lenovo seems to have some setting results in it not charging when the battery is close to full power; thus, from NVIDIA's perspective, it is sometimes on battery power.

This results in SegmentAnything taking ~0.6s when connected to wall power, but ~2.5-4.0s when connected to wheelchair power.

However, we can solve this by leveraging the fact that SegmentAnything is designed to make multiple queries on the same image. The bulk of time is spent in the set_image function, and the predict function is quite fast. Hence, we can run set_image while the user is specifying the seed point, by doing the following:

Create a separate service call in the FoodSegmentation node that runs set_image. Cache the image.
On the app side, call that service as soon as the Bite Selection page is loaded.
When the App calls the SegmentFromPoint action, we should check the current image against the cached image. If it is sufficiently similar, don't re-run set_image.

There will be some edge cases to think about, such as what should happen if the user calls the action while the service is running.

Note that this issue can be solved concurrently with #131 .

Amal Nanavati · Answer 1 · Sat Feb 17 2024 01:52:29 GMT+0800 (China Standard Time)

Note that I addressed this in a different way than poposed, by switching to EfficientSAM. Because in the proposed method, Bite Selection would still take 3-4s before getting a mask (on wheelchair power). While the proposed method would lead to a speedup for people using voice control, for people using touch-based interaction, the entire Bite Selection takes 3-4s on median. So that would be a net slowdown. However, EfficientSAM takes 1s, which would speed it up for those users as well.

Note that EfficientSAM no longer has a set_image phase and prompting phase, so the proposed approach cannot be layered on top of the EfficientSAM approach