PKU-YuanGroup / Chat-UniVi

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Home Page:https://arxiv.org/abs/2311.08046

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about TCBlock

HYUNJS opened this issue · comments

  • In the original version, cross-attention was performed within the TCBlock. However, our experiments revealed that such operations significantly compromised the stability of model training. Consequently, we opted to remove the cross-attention operations.
  • We didn't do a full ablation of MLP, but LLaVA claimed that MLP would be better than Linear.

I see. Thank you for your reply!