There are 9 repositories under multi-modality topic.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Algorithms and Publications on 3D Object Tracking
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
[CVPR 2023] Collaborative Diffusion
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
An official PyTorch implementation of the CRIS paper
An open-source implementation for training LLaVA-NeXT.
Unifying Voxel-based Representation with Transformer for 3D Object Detection (NeurIPS 2022)
This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.
Official code for NeurIPS2023 paper: CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
Embed arbitrary modalities (images, audio, documents, etc) into large language models.
Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!
[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations, ICCV 2021
Official code for WACV 2021 paper - Compositional Learning of Image-Text Query for Image Retrieval