Artanic30's starred repositories
Lumina-T2X
Lumina-T2X is a unified framework for Text to Any Modality Generation
Retrieval-Augmented-Visual-Question-Answering
This is the official repository for Retrieval Augmented Visual Question Answering
VL-CheckList
Evaluating Vision & Language Pretraining Models with Objects, Attributes and Relations.
all-seeing
[ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"
groundingLMM
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
vision-language-models-are-bows
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
taming-transformers
Taming Transformers for High-Resolution Image Synthesis