Jihan Yang's starred repositories
clip-beyond-tail
Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights
DeepSeek-VL
DeepSeek-VL: Towards Real-World Vision-Language Understanding
LanguageBind
ćICLR 2024š„ć Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Video-ChatGPT
[ACL 2024 š„] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
visualwebarena
VisualWebArena is a benchmark for multimodal agents.
gemma_pytorch
The official PyTorch implementation of Google's Gemma models