[ROS2] SegmentFromPoint should run `set_image` before the action call
amalnanavati opened this issue · comments
NVIDIA uses adaptive clocking on its GPUs, which adjusts GPU clock based on the amount of power being supplied to the computer. As a result, when connected to wall power, the GPU operates at a max Graphics Clock rate of 2100MHz
and Memory Transfer Rate of 14002MHz
. When connected to the wheelchair power, it drops to 2100MHz
and 1620MHz
, respectively, and when running on battery power it drops to 420MHz
and 810MHz
. (Open NVIDIA X Server Settings
to the PowerMizer
tab to see these values.) Further, when connected to the wheelchair power (via USB-C), Lenovo seems to have some setting results in it not charging when the battery is close to full power; thus, from NVIDIA's perspective, it is sometimes on battery power.
This results in SegmentAnything taking ~0.6s when connected to wall power, but ~2.5-4.0s when connected to wheelchair power.
However, we can solve this by leveraging the fact that SegmentAnything is designed to make multiple queries on the same image. The bulk of time is spent in the set_image
function, and the predict
function is quite fast. Hence, we can run set_image
while the user is specifying the seed point, by doing the following:
- Create a separate service call in the
FoodSegmentation
node that runsset_image
. Cache the image. - On the app side, call that service as soon as the Bite Selection page is loaded.
- When the App calls the
SegmentFromPoint
action, we should check the current image against the cached image. If it is sufficiently similar, don't re-runset_image
.
There will be some edge cases to think about, such as what should happen if the user calls the action while the service is running.
Note that this issue can be solved concurrently with #131 .
Note that I addressed this in a different way than poposed, by switching to EfficientSAM. Because in the proposed method, Bite Selection would still take 3-4s before getting a mask (on wheelchair power). While the proposed method would lead to a speedup for people using voice control, for people using touch-based interaction, the entire Bite Selection takes 3-4s on median. So that would be a net slowdown. However, EfficientSAM takes 1s, which would speed it up for those users as well.
Note that EfficientSAM no longer has a set_image
phase and prompting phase, so the proposed approach cannot be layered on top of the EfficientSAM approach