ycsun1972

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Language:PythonMIT352600

Valley

The official repository of "Video assistant towards large language model makes everything easy"

Language:Python19300

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

1086400

VisionLLM

VisionLLM Series

Language:PythonApache-2.076400

esper

ESPER

Language:Python2300

TestOfTime

Official code for our CVPR 2023 paper: Test of Time: Instilling Video-Language Models with a Sense of Time

Language:PythonMIT4500

LLM-in-Vision

Recent LLM-based CV and related works. Welcome to comment/contribute!

80200

Text2Poster-ICASSP-22

Official implementation of the ICASSP-2022 paper "Text2Poster: Laying Out Stylized Texts on Retrieved Images"

Language:PythonMIT20100

FrozenBiLM

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

Language:PythonApache-2.015100

MMTG

[ACM MM 2022]: Multi-Modal Experience Inspired AI Creation

Language:Python1800

awesome-multimodal-dialogue

Paper, dataset and code list for multimodal dialogue.

MIT1800

LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language:Jupyter NotebookBSD-3-Clause929800

language-guided-animation

Language-Guided Face Animation by Recurrent StyleGAN-based Generator

1800

Awesome-Computer-Vision-Paper-List

This repository contains all the papers accepted in top conference of computer vision, with convenience to search related papers.

Language:PythonMIT63400

TextBox

TextBox 2.0 is a text generation library with pre-trained language models

Language:PythonMIT106700

awesome-audiovisual-learning

A curated list of audio-visual learning methods and datasets.

21400

XPretrain

Multi-modality pre-training

Language:PythonNOASSERTION45600

awesome-embodied-vision

Reading list for research topics in embodied vision

MIT46200

CVPR2024-Paper-Code-Interpretation

cvpr2024/cvpr2023/cvpr2022/cvpr2021/cvpr2020/cvpr2019/cvpr2018/cvpr2017 论文/代码/解读/直播合集，极市团队整理

1237600

Transformer-in-Vision

Recent Transformer-based CV and related works.

130700