Narrations with multiple noun annotations for Action Recognition
gpantaz opened this issue Β· comments
Hi π
Thank you for all of your work. I would like to clarify what happens when evaluating the performance of a model on the Action Recognition benchmark in cases of multiple nouns. I can see that we are expected to submit a json with the following format:
Where a narration has the verb
and noun
entries ranking the verb
and noun
classes. Can we have narrations multiple nouns similar to train and validation or should we care only about the noun labelled as the noun class? To make this more clear, here is an example from the train set:
P01_01_104,P01,P01_01,00:08:07.610,00:08:08.38,00:08:09.12,29302,29347,put container on top of counter,put-on,1,container,21,"['container', 'top:counter']","[21, 42]"
In this example we are expected to predict 21 & 42 as the noun classes or only 21?
Apologies if this has been asked before, I couldnt find anything related.
Best,
George π
Thanks for your question. In the case of action recognition challenge, we parse the sentence to identify the main noun that is related to the verb put. We only consider that main noun, i.e. container, as the correct noun for the action. Other nouns, including the counter here, would be treated as incorrect.
So I do confirm that for this challenge only 21 should be predicted.
Note that this is also available in the noted example as:
put-on,1,container,21
This is all the information the action recognition challenge uses for both training and evaluation.
To justify this explanation, the narrations are not consistent as always containing the target location, e.g. "on the counter". Some narrations would only state: "put down container" or "put container". Thus it was safer to ignore the additional details of the narration, but we make these available and use them in other challenges, such as the Multi-Instance Retrieval challenge.