movienet / movienet-tools

Tools for movie and video research

Home Page:http://movienet.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Extract Action Features from a single frame

albertaparicio opened this issue · comments

I am having major issues and problems trying to extract action features from a single frame.

I am trying to build a frame classifier that takes the action features of this project's model.

However, I cannot find a way to do this.

I have tried using the PersonDetector (not the ParallelPersonDetector) on a batch of images (tensor of shape [batch_size,n_channels,height,width]), but there were a myriad of issues, most of them coming from the fact that the code assumes it is working with a single image. I would have to rewrite almost the whole detector from scratch

I gave up on the idea of processing batches of images and instead focused on processing a single image at a time. After having to alter the PersonDetector, because it seems it has not been updated up to the same point the ParallelPersonDetector is, I managed to get a single frame's persons.

However it all went downhill when I tried using the ActionExtractor module on that frame, with those person features. It looks like the ActionExtractor expects a sequence of images, but I cannot figure out how this sequence is supposed to be represented as. Also I do not know how am I supposed to represent the person features of each frame in this sequence.

All I am asking is a simple way I can input a frame into the action model and get the features that I get for each frame when I run the extract_action_feats script.

Since I have been exploring this project all I have found is missing data and convoluted code that (in my opinion) is unnecessarily complicated. It looks pretty on the outside and on the demo code, but once one gets under the hood of this project it is very hard to do anything.

I hope we can find a way to solve this situation so not only me, but anyone, can benefit from this project, which I am sure has many great things to offer.

You could try this code:

from movienet.tools import PersonDetector, ActionExtractor

img = read_image('path/of/your/image') # read single frame

# detect bboxes from single image
detector = PersonDetector()
bbox=detector.detect(img)

# if bbox is not None, you could extract action feature from single frame by the following code
# if your input bbox is not normalized to [0,1), remember to set require_normalized_bbox=False
extractor = ActionExtractor(require_normalized_bbox=False) 
result = extractor.extract([img], bbox)

I see, thank you, I did not know about neither the require_normalized_bbox parameter nor how to input the data into the ActionExtractor

With the code you shared, reading the image with mmcv.imread I was able to extract features for a single image