ros-perception / vision_msgs

Algorithm-agnostic computer vision message types for ROS.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Keypoints and polygons instead of bounding boxes

mistermult opened this issue · comments

Hi,

currently I'm integrating a face detector. It does not emit a bounding boxes, but key points, e.g. one point in the center of the left eye, one point in the center for the right eye, one point for the center of the mouth (simplified). There are two possible ideas to model it in vision_msg:

  1. Use one detection for each keypoint with one hypothesis. Make a bounding box of size 0x0 at the key point and set hypothesis.id = "left_eye"/"right_eye".
    ++ No need to extend vision_msg
    -- How to group multiple faces in one image? Maybe by hypothesis.id="left_eye_1", but this would generate ids that are not in the database. Maybe by tracking_id?
    -- If faces are somehow grouped by some id, clients have to iterate through all detections and group the to find all key points for some face.
    -- Does not describe the domain. Clients need more implicit knowledge.
    -- Cannot describe polygons.

  2. In the annotation tool CVAT, which is connected to OpenCV, you can use either use bounding boxes or an array of points. The latter is used to annotate key points, or arbitrary polygon shapes. If the points are meant to be key points, the order implicitly defines what each point describes, e.g. the first point is always the left eye etc. This would yield new messages:

#Detection2D.msg
Header header

# Class probabilities
ObjectHypothesisWithPose[] results

# 2D bounding box surrounding the object.
BoundingBox2D bbox
# Keypoints in the image or arbitrary polygon. 
geometry_msg/Point2D[] points 

# The 2D data that generated these results (i.e. region proposal cropped out of
#   the image). Not required for all use cases, so it may be empty.
sensor_msgs/Image source_img

# If true, this message contains object tracking information.
bool is_tracking

# ID used for consistency across multiple detection messages. This value will
#   likely differ from the id field set in each individual ObjectHypothesis.
# If you set this field, be sure to also set is_tracking to True.
string tracking_id

-- Need to extend vision_msg
++ Can also describe key points and polygons.
++ Describes the domain with key points.
++ One array of points can model all the other shapes of other annotation tools.
-- More complexity, especially if clients want to support polygons. Key points should be OK. Clients might just ignore them.
-- Clients might get a message with a empty/default bounding box if only key points are set. Maybe add a field like shape_type (0 = boundingBox, 1 = key point, 2 = polygon) or has_points.
** There are other ways to model it, e.g. by a completely new message KeypointDetectionXD.

Let me know what you think about support for key points/polygons and the above extensions of Detection2D.

Closing due to inactivity.