Open-Debin / Emotion-FAN

ICIP 2019: Frame Attention Networks for Facial Expression Recognition in Videos

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

about self-attention and relation-attention

oukohou opened this issue · comments

in your code Demo_AFEW_Attention.py, seems the self-attention and relation-attention can not be used simultaneously?

at_type = ['self-attention', 'relation-attention'][args.at_type]
print('The attention is ' + at_type)

This seems different from your paper:
image

If so, why?

Thanks for your comment. In the code, 'relation-attention' means using both 'self-attention' and 'relation-attention' . Because the 'relation-attention' is base on a global feature that is the output of 'self-attention'.

The naming 'relation-attention' here will be ambiguous, thank you for your question, I will modify it

understood, thanks!

sorry to bother again, but the code logic is rather complicated to me, so i think better another question again:
If i understand correctly, in your Demo_AFEW_Attention.py, the function validate() only inference one single image instead of all three images? which is different from the train processing flow.

for i, (input_var, target, index) in enumerate(val_loader):
            # compute output
            target = target.cuda(async=True)
            input_var = torch.autograd.Variable(input_var)
            ''' model & full_model'''
            f, alphas = model(input_var, phrase='eval')
            
            pred_score = 0
            output_store_fc.append(f)
            output_alpha.append(alphas)
            target_store.append(target)
            index_vector.append(index)
            
            # measure elapsed time
            batch_time.update(time.time() - end)
            end = time.time()

If so, why is that?
And what's does the index_matrix actually do?
Why is the eval procedure is different from the train procedure?

Thank in advance!

Hello, the method is consistent with the paper. So, each prediction of a video is from the inference of all frames of the video. The index_matrix is telling which frame belongs to which video. You can get the shape of index_matrix. [num_of_videos, num_of_frams_in_entire_database]. I hope my answer could help you.
Thanks for your interested in my project, could you please give me a star? Thanks.

@oukohou
Merry Christmas, I recently update the Emotion-FAN, new features include data process, environment install, CK+ code, Baseline code, and more detail instructions. Also, you can find the old version directory of Emotion-FAN in the README.md. I hope my new updates can help you greatly. Please see the Emotion-FAN for more details.