liury889's starred repositories
awesome-multi-agent-papers
A compilation of the best multi-agent papers
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
Video-ChatGPT
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
GroundingGPT
[ACL 2024] GroundingGPT: Language-Enhanced Multi-modal Grounding Model
Chat-UniVi
[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Video-LLaVA
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
VideoCrafter
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models