Feature Request: Allow the linking of duplicate images

Question

Feature Request: Allow the linking of duplicate images

moi90 opened this issue 6 months ago · comments

Simon-Martin Schröder commented 6 months ago

With the LOKI (and surely other instruments as well), there are sometimes time periods where the sample does not move past the camera and therefore, objects are photographed multiple times.

Currently, we use the "dubious" status to mark such images.

It would be nice to "link" duplicate objects together so that they can be annotated together.

A weaker form of this feature request would be to just mark objects as duplicates (not "duplicate of X").

grololo06 · Answer 1 · Sat Feb 24 2024 13:38:54 GMT+0800 (China Standard Time)

Hello, in theory, in EcoTaxa, an object is a unique physical object, and it's possible to attach it to several images views. E.g. https://ecotaxa.obs-vlfr.fr/objectdetails/163337544?w=1130&h=848
So we could 'easily' attach your second/third/fourth... images, like above. But I think I remember that @jiho had an argument against this solution, maybe related to associated data.

Simon-Martin Schröder · Answer 2 · Sun Feb 25 2024 17:00:31 GMT+0800 (China Standard Time)

Yes, I know that you can save multiple images per object.
But I'm not a fan of this solution, either: different views of the same object can have different feature vectors and metadata.
Additionally, it would be hard to separate these images later if it turns out that they are not duplicates after all.
It would be great to be able to easily create and delete these relationships.

For the moment, another application external to EcoTaxa might be the way to go. (As this might be a long in the future (or never) feature). But it might make sense to think this through, in case you want to implement something like this some day.

I would add a object_master_id (or a similar name) field and store the object ID of the object that serves as representative for a group of duplicates.

Jean-Olivier Irisson · Answer 3 · Tue Feb 27 2024 08:23:09 GMT+0800 (China Standard Time)

This is different from the various views of the same object indeed*; this relates to artefacts of the imaging system that takes several images for the same physical object (because it was stuck in the field of view). The difficulty arises when those must be counted: sometimes they should count as 1, not n (in the case of an objet stuck on the glass in the flowcam); sometimes they should actually be counted as n (same object visible on several frame of the UVP : the water volume is counted several times so the object should be counted several times too, to keep the concentration correct).

Our "dirty" solution to this is to place one object in the correct taxon and, when relevant, all the others in the taxon "duplicate" (and several "duplicate" could be branched in various places of the tree, if need be). This is dirty but generic and discoverable (it's just another taxon). I don't know how much better an other solution could be. It could be an additional flag, exported with the data, but I would bet an arm that many people won't pay attention to it an will compute concentrations that included the duplicates when they should not (the current solution prevents this).

FYI, a feature soon to be release will be to look for objects based on visual similarity, which will allow, at least, to quickly detect those duplicates.

[*]: as for how to treat those = they should definitely be linked to the same object, the feature vectors associated to the object could hold data from the various views and CNNs could be made to take advantage of those various views. Currently no network does this but it is not very complicated (and particularly simple if there are 3 or less views, since the pre-trained architectures that work with RGB can be leveraged).

Simon-Martin Schröder · Answer 4 · Tue Feb 27 2024 18:12:47 GMT+0800 (China Standard Time)

this relates to artefacts of the imaging system that takes several images for the same physical object (because it was stuck in the field of view)

Right, that's what I'm talking about.

Our "dirty" solution to this is to place one object in the correct taxon and, when relevant, all the others in the taxon "duplicate".

I'm not opposed to this solution. (Although, in the limit, you get a "duplicate" sub-category for each existing category.) But you still don't know which other object an object is a duplicate of. Therefore, my proposal of a object_master_id field.

I would bet an arm that many people won't pay attention to it an will compute concentrations that included the duplicates when they should not (the current solution prevents this).

I'm not sure about that. Even now, either people have to be aware that duplicates can occur and have to actively think about whether they need to be counted or not. When they have to think about that, it is not far away to actively exclude duplicates from the analysis. You could even make it an export option:

"[ ] Include duplicates" => If not checked, exclude objects with a non-empty `object_master_id`.

FYI, a feature soon to be release will be to look for objects based on visual similarity, which will allow, at least, to quickly detect those duplicates.

Cool! (And who knows, maybe the next logical step will be to link them, as proposed :) )

different from the various views of the same object [...] they should definitely be linked to the same object, [...] CNNs could be made to take advantage of those various views. [...] pre-trained architectures that work with RGB can be leveraged.

I don't think it is that simple in this case. Different images per object are often the output of different processing steps and not actually different "views", e.g. raw RBG and cleaned-up grayscale.