Segmentation array option in Galactic

Question

Segmentation array option in Galactic

ngeerlingsriwo opened this issue 2 years ago · comments

Nathalie Geerlings commented 2 years ago

Dear maintainers,

vision_msgs provides communication options for classification and detection tasks, but what of segmentation tasks? Up till ROS2 Foxy I (mis)used the DetectionXDArray messages for this, where I put the segmented binary mask and segmented pointcloud in source_img and source_cloud respectively. I noticed that these fields have disappeared in the Galactic branch. What would be the recommended method for segmentation arrays in Galactic?

I could think of two ways to do this:

Add SegmentationXDArray messages, with the SegmentationXD message being:

std_msgs/Header header
ObjectHypothesisWithPose[] results
sensor_msgs/Image mask (2D) and sensor_msgs/PointCloud2 cloud (3D)
string id

As you state in the Readme, you could use a time synchroniser. For a single segmentation that would work fine: you could use vision_msgs::msg::Detection2D with sensor_msgs::msg::Image. But what happens if your input results in multiple segmentations? Would you then use vision_msgs::msg::Detection2DArray in syncronisation with a custom message containing an array of sensor_msgs::msg::Image instances with id's? And then manually match the id's of the Detection2D instances with the Image instances? This feels a bit tedious.

Please let me what you think.

Adam Allevato · Answer 1 · Tue Feb 08 2022 10:15:14 GMT+0800 (China Standard Time)

I'm a bit confused about your use case. Your 1st option proposes an image or pointcloud mask, but then also includes an object pose. Does the output of your segmentation also include pose information?

As you have seen, we have made a conscious decision to have vision_msgs not depend on sensor_msgs. Segmentations can be represented several different ways (for example, you might have a 2D mask for a 3D point cloud, there's not any standard I know of for how to label pixels that aren't part of any segmented object, etc.) So I think that any implementation that used pointcloud or image data would have to be made as part of your own separate repository, rather than re-adding the sensor_msgs dependency.

But back to your original question...

It sounds like you might have a pipeline where a single point cloud (I will ignore 2D for now) is segmented into multiple smaller clouds, and each of these has its pose detected. In this case, you are generating pairs of detections and point cloud masks. I could see a couple of message definitions like the following:

Segmentation3DResult.msg
PointCloud2 source
Segmentation3D[] segmentations

Segmentation3D.msg
PointCloud2 mask
Detection3D detection

Of course, these are just suggestions, and I wouldn't support adding them to this repository because of the reasons listed above (not standardized).

Steve Macenski · Answer 2 · Tue Feb 08 2022 10:30:32 GMT+0800 (China Standard Time)

I think having a message type for 2D segmentation from a typical 2D-AI segmentation algorithm would be valuable. It would be similar to image data but having additional metadata fields and removing some other non-relevant fields.

I was having a conversation with another group about this topic just days ago as well and I was surprised to see that gap in this package.

Martin Günther · Answer 3 · Tue Feb 08 2022 23:56:47 GMT+0800 (China Standard Time)

@SteveMacenski wrote:

I think having a message type for 2D segmentation from a typical 2D-AI segmentation algorithm would be valuable. It would be similar to image data but having additional metadata fields and removing some other non-relevant fields.

For normal segmentation images (i.e., where each pixel is labeled with a semantic class), I've always been using normal sensor_msgs/Image messages. The advantage compared to custom messages is that you can use all standard ROS functionalities for processing images: working with cv_bridge, image_transport and OpenCV in your C++ code, using image_proc and image_geometry to rectify your images, project them into a different camera's frame, generate a point cloud with semantic labels and so on, often without writing a single line of code. So I would think twice before defining a "segmentation image" message that is almost identical to a standard sensor_msgs/Image, but not fully.

I realize this is going a bit on a tangent here and not 100% relevant to @ngeerlingsriwo's original question.