Same narrations but different noun_class in two videos
JiankunW opened this issue · comments
Hi I am new to this cool dataset. Consider this example:
narration_id | participant_id | video_id | narration_timestamp | start_timestamp | stop_timestamp | start_frame | stop_frame | narration | verb | verb_class | noun | noun_class | all_nouns | all_noun_classes | action_class |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P02_09_238 | P02 | P02_09 | 00:17:10.170 | 00:17:08.04 | 00:17:09.27 | 61682 | 61756 | pick up slice | pick-up | 0 | onion | 16 | ['onion'] | [16] | 0_16 |
P27_105_221 | P27 | P27_105 | 00:12:37.401 | 00:12:37.45 | 00:12:40.48 | 37872 | 38024 | pick up slice | pick-up | 0 | slice:ham | 156 | ['slice:ham'] | [156] | 0_156 |
I am confused that the two videos share the same narrations but have different noun_class. According to the paper, the verb and nouns are parsed from narrations, then how could the two different nouns come from the same narration?
What I noticed is that the two videos are from EPIC-55 and EPIC-100 respectively. Maybe the reasons are behind of how you collect the data.
Hi, thanks for your question.
When converting between the original noun and the noun we label we propagate nouns from previous actions/via manual inspection if the original noun is too generic. In this case, 'slice' doesn't tell you the type of object - only that it is a slice of one - so we replaced it with 'onion' and 'slice of ham' respectively.
Hopefully this answers your question,
Michael
@mwray How did you decide the original noun is too generic or not? BTW, do you have related descriptions of this process in your papers?
Thanks for your rapid reply.
Jiankun
All the details are in our papers. Please read both IJCV and PAMI papers for details. One direct point to get you going (but explanations also exist elsewhere) are in Sec 3.4 in our PAMI paper: https://ieeexplore.ieee.org/document/9084270
IJCV supplementary also contains additional details.