Question about TCBlock

Question

HYUNJS opened this issue 5 months ago · comments

In my opinion, TCBlock just returns the clustered data info from the output of CTM. May I ask for any context/background on using this in your implementation?
https://github.com/PKU-YuanGroup/Chat-UniVi/blob/main/ChatUniVi/model/cluster.py#L259-L287
Also, in the recent commit, you have separately created the mm_projector builder. Is this indicating that you are conducting ablation experiments for its design (e.g., linear, MLP, residual..?)

Peng Jin · Answer 1 · Tue Apr 16 2024 00:08:33 GMT+0800 (China Standard Time)

In the original version, cross-attention was performed within the TCBlock. However, our experiments revealed that such operations significantly compromised the stability of model training. Consequently, we opted to remove the cross-attention operations.
We didn't do a full ablation of MLP, but LLaVA claimed that MLP would be better than Linear.

HYUN Jeongseok · Answer 2 · Tue Apr 16 2024 11:28:59 GMT+0800 (China Standard Time)

I see. Thank you for your reply!