jpthu17 / HBI

[CVPR 2023 Highlight] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Banzhaf Interaction questions

chchshshhh opened this issue · comments

Thank you for your excellent work!
But I have a question, is the following code "banzhaf[:, i, j] = self.banzhaf_interaction(retrieve_logits, text_mask, video_mask, text_weight,video_weight, i, j)" missing a plus sign?

 for i in range(self.t_len):
            for j in range(self.v_len):
                for _ in range(self.num):
                    banzhaf[:, i, j] = self.banzhaf_interaction(retrieve_logits, text_mask, video_mask, text_weight,
                                                                    video_weight, i, j)
        banzhaf = banzhaf / self.num
        banzhaf = torch.einsum('btv,bt->btv', [banzhaf, text_mask])
        banzhaf = torch.einsum('btv,bv->btv', [banzhaf, video_mask])
        return banzhaf

Yes, thank you very much for reminding us of our mistake. We have fixed this bug.