add unique object ID for tracking

Question

add unique object ID for tracking

hakuturu583 opened this issue 5 years ago · comments

Hi, I want to make object detection (https://github.com/OUXT-Polaris/nnabla_vision_detection) and tracking ROS package and visialization tool(https://github.com/OUXT-Polaris/vision_msgs_visualization) using vision_msgs.
However, there is no object unique id in this package.
So,I want to modify this message like below.

Detection2D.msg

# Defines a 2D detection result.
#
# This is similar to a 2D classification, but includes position information,
#   allowing a classification result for a specific crop or image point to
#   to be located in the larger image.

Header header

# Class probabilities
ObjectHypothesisWithPose[] results

# 2D bounding box surrounding the object.
BoundingBox2D bbox

# The 2D data that generated these results (i.e. region proposal cropped out of
#   the image). Not required for all use cases, so it may be empty.
sensor_msgs/Image source_img

# If this message was tracking result, this field set true.
bool is_tracking

# Unique ID which was set by tracker ROS node.
uuid_msgs/UniqueID tracking_id

Detection3D.msg

# Defines a 3D detection result.
#
# This extends a basic 3D classification by including position information,
#   allowing a classification result for a specific position in an image to
#   to be located in the larger image.

Header header

# Class probabilities. Does not have to include hypotheses for all possible
#   object ids, the scores for any ids not listed are assumed to be 0.
ObjectHypothesisWithPose[] results

# 3D bounding box surrounding the object.
BoundingBox3D bbox

# The 3D data that generated these results (i.e. region proposal cropped out of
#   the image). This information is not required for all detectors, so it may
#   be empty.
sensor_msgs/PointCloud2 source_cloud

# If this message was tracking result, this field set true.
bool is_tracking

# Unique ID which was set by tracker ROS node.
uuid_msgs/UniqueID tracking_id

I want to use same message for tracking and detection, in order to use same ROS node for the visialization, so I do not want to add new message type.

Adam Allevato · Answer 1 · Fri May 31 2019 06:43:46 GMT+0800 (China Standard Time)

Hi, thanks for proposing a change! I agree that right now it is difficult to perform multi-object tracking using only vision_msgs so let's see if we can improve it.

Can you please clarify what the ID keeps track of in your implementation? Does each real-world object in the scene get its own UniqueID? Each detection message? Each image/detection event?

Masaya Kataoka · Answer 2 · Sat Jun 01 2019 08:09:26 GMT+0800 (China Standard Time)

I think pipelines like below is good for building object detection/tracking system with vision_msgs.
(https://docs.google.com/presentation/d/e/2PACX-1vTUnCyBg9Kxuya1BWyjTJcAf9lGJa_th5n-Ta2JxoHbiwgZS5lHWjOhvDsvzqMJB9s1Rey9wgOh59Md/pub?start=false&loop=false&delayms=3000)

Masaya Kataoka · Answer 3 · Sat Jun 01 2019 08:15:13 GMT+0800 (China Standard Time)

I think it is unnecessary to set unique ID in object detector.
In order to setting unique ID, time series processing is necessary.
I think detection node is one-shot algorithm such as yolo,voxelnet,SVM etc...
Tracking node contains time series processing algorithm such as particle filter, kalman filter...
So, I think it is better for us to set unique tracking ID in tracking node.

Masaya Kataoka · Answer 4 · Sat Jun 01 2019 08:15:59 GMT+0800 (China Standard Time)

In order to show the result is tracking result or detectino result cleary, I made is_tracking field.

Martin Günther · Answer 5 · Sun Jun 02 2019 17:37:32 GMT+0800 (China Standard Time)

+1

I agree this would be a valuable addition. The comments should be checked by a native speaker (@Kukanani ?), otherwise it looks good to me.

Masaya Kataoka · Answer 6 · Mon Jun 03 2019 09:55:05 GMT+0800 (China Standard Time)

I also sending PR for this issue. #19
If you are OK.
Pleas review it!!

Masaya Kataoka · Answer 7 · Thu Jun 06 2019 09:09:27 GMT+0800 (China Standard Time)

@Kukanani How do you think about my architecture design and message modification?

Adam Allevato · Answer 8 · Mon Jun 10 2019 23:58:51 GMT+0800 (China Standard Time)

@hakuturu583, I'm happy to move forward with getting this merged, but I will probably propose changes to the comments based on this discussion.

Just to clarify again: the tracking_id field should be the same across multiple messages, as long as those messages represent detections of the same real-world object. Is this correct?

Masaya Kataoka · Answer 9 · Tue Jun 11 2019 21:17:35 GMT+0800 (China Standard Time)

@Kukanani Yes, I think tracking_id of the same object should be same across multiple message as long as the tracking node continue tracking the target object.

mistermult · Answer 10 · Mon Jul 15 2019 18:39:18 GMT+0800 (China Standard Time)

I'm also interested in adding IDs.

Place of ID
However, I'm no satisfied with the solution #19. The ID should stay the same if the detected object is the same, i.e. the same entity. However, currently the ID is added to Detection2D.msg. So I cannot model the following: Object 123 is detected with probability of 50% and Object 234 is detected with probablity 50%.

Solution: Move ID from Detection2D.msg to ObjectHypothesis.msg/ObjectHypothesisWithPose.msg. Moreover, rename from tracking_id to object_id, because it describes that this is the same object entity.

Type of ID
See also #17 (comment)

Solution: Make the type of tracking_id/object_id and old id (=class id) string.

Both tracking_id and object_id should be of same type because they are foreign keys in similar collections. Class and entity are really similar: "Red cube" might be a unique object until there are multiple red cubes of exactly the same type on the table.

String:

Human readable
Also used in movit_msgs (e.g. CollisionObject) and all messages in of object_recognition_msgs
Can stand for itself without a database, e.g. just "apple"
Can put UUIDs, URLs (linked data, to wikipedia, gazebo), or Wikidata IDs (Q812880 for cube) into it

larger than Int (but the images are much larger, so it is negligible)
slower to (de)serialize, because of length field (but object detection is much slower)

no dependency on message of UUID

meaning not described by type. Could be a URL, a string of a UUID etc. (but the is can also be a benefit. Applications have to decide what it means).

UUID, Int:

not human readable
needs database lookup to get meaning

smaller (but see above)
faster to deserialize (but see above)
Type UUID exactly describes the meaning.

Proposed solution

Keep Detection2D.msg/Detection3D.msg as it is (or only add is_tracking). Modify ObjectHypothesis.msg (ObjectHypothesisWithPose.msg) as following:

# An object hypothesis that contains no position information.

# The unique ID of object detected. 
# If the two object_id's are the same in different messages for different images,
# it means that the same real object (entity) has been detected in both images.
# Object detection pipelines that do not output such IDs should set this to the
# empty string "".
string object_id

# The unique ID of class of the object detected [....]
string id #Change to string; 

# The probability or confidence value of the detected object. By convention,
#   this value should lie in the range [0-1].
float64 score

This also has the benefit that a classifcation (Classification2D.msg) now also can give detected object a ID because it includes ObjectHypothesis.

Please review my proposal @hakuturu583 , @Kukanani , @LeroyR, @mintar. I added some up/down buttons below. You might just want to vote.

Leroy Rügemer · Answer 11 · Tue Jul 16 2019 02:00:37 GMT+0800 (China Standard Time)

I understand where @mistermult is coming from, however:

Place of ID

The Detection it is still one entity(e.g. having one BB/Pointcloud) that we add to the planning scene, even if it has multiple hypothesis which specific class / track it is.
Which object_id is then set in the planning_scene/collision_object?
First Hypothesis? Hypothesis with the highest score? Set the same in all hypos?
-> we need the field on the detectionXd level if using the object based on the pointcloud

I can imagine a system that uses the best hypothesis as basis for e.g. Manipulation, adding the known mesh/model in the assumed orientation, but it may be always safer to base Manipulation on the sensor data.

Also: On the Hypothesis level you can already use the current messages by simply using the id field as the track id, as you currently have to lookup the label anyway.

e.g with rosparam:

object_tracker:
    0: apple

i still think detectionXd is the correct place

Type of ID

Agree, as long as moveit is using strings.

mistermult · Answer 12 · Tue Jul 16 2019 07:32:05 GMT+0800 (China Standard Time)

@LeroyR Thanks for the reply.

I have to clarify my proposal. My proposed object_id describes the object. Assume that each real object has a unique number written on (two identical red cubes would have different numbers). This would be the object_id. The object detection/tracking of course creates the numbers rarely arbitrarily.

Which object_id is then set in the planning_scene/collision_object? First Hypothesis? Hypothesis with the highest score? Set the same in all hypos?

We must differentiate between two thing:

BB (in the image) / clusters(in the point cloud) that are somehow connected to BB/clusters in the next frames (one id for BB and clusters). We might label the BB/cluster in all frames with the same id if they are somehow connected.
Real world objects. We might label the object with the same id in multiple frames if it is the same object (same entity).
These are often related but not the same: There is red blob moving from the to right in the image. The bounding boxes are related, so give them all the same id in different frames. First it is detected as a apple (id="apple", object_id=1). Then the tracker the classifies it wrongly as a orange that hasn't been seen yet (id="orange", object_id=10 (new id, because it is a new entity. A apple cannot be the same as a orange). 5 Frames later it is detected again as the apple (id="apple", object_id=1). See also the modelling problem below.
If you want to do collision avoidance you might want add the whole cluster to your collision model to make sure not to collide with any point (your all hypos option). You only care about the cluster. Add the BB/cluster id.
If you want to add a saved mesh of the detected object, you already have to choose one hypothesis: Different hypothesis might have different (class_)id(s), which describe different object classes, which have different meshes. So you must decide which mesh to add (with the object_id) because different classes might have really different meshes (large or small). You might want to add all meshes (each with a different object_id) and group them in a group with the id of the BB/cluster.

Problem when using only the tracking_id in ClassificationX

Assume that:

There is a apple and a orange on the table.
I have a OpenCV tracker that tracks both fruits in the image. This is good at tracking but bad at classification.
Intially it emits:

Detection2D = [ 
detections: [
{
    results: [
        {
            id: "apple",
            object_id: 1, #the left cube
            score: 1.0,
            pose:...,
       }
    ]
    bbox = ...,
    source_img = ...
},
{
    results: [
        {
            id: "orange",
            object_id: 2, #the left cube
            score: 1.0,
            pose:...,
       }
    ]
    bbox = ...,
    source_img = ...
}]

This would also work with object_id/trackig_id in the message. Then I slowly move both fruits into the middle of the table. If the two objects touch, I remove one fruit quicly. The tracker is good at tracking but bad at classification. So it knows that there is one known fruit (so only one bounding box), which must be a apple or a orange. So its object_id is 1 or 2, but does not know the class. So it must emit:

Detection2D = [ 
detections: [
{
    results: [
        {
            id: "apple",
            object_id: 1, #the left cube
            score: 0.5,
            pose:...,
       },
       {
            id: "orange",
            object_id: 2, #the left cube
            score: 0.5,
            pose:...,
       }
    ]
    bbox = ...,
    source_img = ...
}]

Now assume that we use tracking_id in Classification2D instead of object_id in ObjectHypothesis. There are two potential IDs (1 or 2). So there must be two Classification2D. I conclude:

Detection2D = [ 
detections: [
{
    results: [
        {
            id: "apple",
            object_id: 1, #the left cube
            score: 0.5,
            pose:...,
       }
    ]
    bbox = ...,
    source_img = ...
},
{
    results: [
        {
            id: "orange",
            object_id: 2, #the left cube
            score: 0.5,
            pose:...,
       }
    ]
    bbox = ...,
    source_img = ...
}]

There are multiple problems:

There are 2 bounding boxes, but there is definitively only one.
The scores in each result array do not add up to 1.
It doesn' solve your problem which object to add. Indeed, it makes it even more problematic, because now the detected objects are now even grouped by bounding box.

Also: On the Hypothesis level you can already use the current messages by simply using the id field as the track id, as you currently have to lookup the label anyway.

Assume I track 2 apples. As clarified by @Kukanani, id describes the class. So id="apple" for both apples. To track the apples across multiple frames I have to identify each apple with a unique additional ID: the object_id. So I have (id="apple", object_id=1) for the left apple and (id="apple", object_id=2) for the right apple.

mistermult · Answer 13 · Tue Jul 16 2019 07:47:14 GMT+0800 (China Standard Time)

In conclusion:

tracking_id in ClassificationX:: If the tracking_id (example: bbox1) is the same across multiple frames the BB/cluster are related/ the 'same' (of course they are at slightly different positions).
object_id in ObjectHypothesis:: If the object_id (example: orange1) is the same across multiple frames the same object (entity) has been detected across multiple frames.

Advantages of object_id in ObjectHypothesis:

ObjectHypothesis::object_id is more granular than ClassificationX::tracking_id. If the granularity is not needed set the object_id the same in all hypothesis.
In moveit_msgs/CollisionObject there is type id and entity id, too.

Martin Günther · Answer 14 · Thu Jul 18 2019 19:26:31 GMT+0800 (China Standard Time)

I'm currently on vacation and on mobile, so apologies beforehand for being brief.

I prefer track_id and class_id to be strings, for the reasons listed.

I also believe track_id should go into DetectionXd. We should not add object_id to ObjectHypothesis, IMO. Reason: track_id has a clearly defined meaning - it associates this DetectionXd to one from the previous frame. The examples cited by @mistermult go beyond tracking; this is called "anchoring" (Saffiotti et al.). I admit that the example with the two tracked objects assigned to one single detection cannot be modeled elegantly without adding object_id to the hypotheses; however, I would argue that this is a special case anyway. In general, we will anyway have to model it the way that @mistermult mentioned later: each (potential) object becomes one DetectionXd, with a unique track_id (if tracked); if there is uncertainty about the object class, that is modeled via the Object Hypotheses. If the tracker is using MHT (Multi Hypothesis Tracking) or equivalent, it is probably best to do that internally and only publish the most likely hypothesis (here: assignment of objects to tracks). Or it could publish each "hypothesis" (in the MHT sense of the word) as a separate DetectionXdArray if desired.

Martin Günther · Answer 15 · Thu Jul 18 2019 19:34:21 GMT+0800 (China Standard Time)

Example of why I think the example above is a special case: assume there are two real objects (apple and orange), and two detected objects (o1 and o2). Now assume the tracker doesn't know which one is which. If you simply model o1 = (apple|orange), o2 = (apple|orange), you don't express the constraint that the combination o1=apple, o2=apple is invalid. Much better to publish two separate DetectionXdArrays: o1=apple, o2=orange | o1=orange, o2=apple, or even just the most likely.

mistermult · Answer 16 · Fri Jul 19 2019 16:54:31 GMT+0800 (China Standard Time)

@mintar The most correct version would be, that the tracker publishes multiple DetectionXdArrays if it is uncertain. However, in this case the score would have to be at DetectionXdArrays. But if the tracker cannot differentiate n objects, it would have to publish n! messages.

mistermult · Answer 17 · Fri Jul 19 2019 17:00:36 GMT+0800 (China Standard Time)

I see that I cannot find support for a object_id in ObjectHypothesis. I still thing this will bite us in the future. Nevertheless, I suggest that we are going forward with the majority:

Add track_id (which seems to be a better name that tracking_id) of type string to DetectionXd.

Martin Günther · Answer 18 · Sat Jul 20 2019 04:35:19 GMT+0800 (China Standard Time)

The most correct version would be, that the tracker publishes multiple DetectionXdArrays if it is uncertain. However, in this case the score would have to be at DetectionXdArrays.

Yes. There is no need to add the score to DetectionXdArray. Simply add a message that has an array of DetectionXdArray and an array of scores. If you want to go down this road, I suggest you create such a message outside of vision_msgs. Once it has proved useful in a real implementation, it could be merged into vision_msgs.

But if the tracker cannot differentiate n objects, it would have to publish n! messages.

Correct. This is in the nature of the problem. Pretending that the probabilities are independent simplifies the problem, but is wrong. MHT solves the problem by only keeping track of a fixed number of hypotheses, not all possible ones, like a particle filter.

I see that I cannot find support for a object_id in ObjectHypothesis. I still thing this will bite us in the future. Nevertheless, I suggest that we are going forward with the majority:

Add track_id (which seems to be a better name that tracking_id) of type string to DetectionXd.

Agreed. We can still add object_id to the hypotheses later if we find a compelling and common use case.

Adam Allevato · Answer 19 · Sat Jul 20 2019 06:17:00 GMT+0800 (China Standard Time)

See #19, now merged, for tracking on Detection messages. Please re-open the issue if and when we need to revisit.

Masaya Kataoka · Answer 20 · Mon Aug 05 2019 10:57:34 GMT+0800 (China Standard Time)

@Kukanani
@mistermult
I feel verry sorry to my late reply.
I strongly disagree with using string in tracking_id.
So, I propose to revert change in #22
It makes it is unnecessary for humans to check tracking ID directory.
It is just a visualization problem.
I am now developing visualization nodes for this message.
So, it will be no problem soon.
If you want to treat tracking result from a multiple tracker nodes, we have to check the batting of the tracking ID.

Masaya Kataoka · Answer 21 · Mon Aug 05 2019 11:02:09 GMT+0800 (China Standard Time)

@mistermult The reason why I use uuid_msgs is users should recoganize the filed is UUID corectly.

Masaya Kataoka · Answer 22 · Mon Aug 05 2019 11:23:37 GMT+0800 (China Standard Time)

@Kukanani @mistermult @LeroyR I failed to reopen this issue.
Can you discuss here?? #25