Tongjia's repositories
CVPR23-LOVEU-AQTC
【CVPRW'23】First Place Solution to the CVPR'2023 AQTC Challenge
adapt-image-models
[ICLR'23] AIM: Adapting Image Models for Efficient Video Understanding
Ask-Anything
ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Awesome-Anything
General AI methods for Anything: AnyObject, AnyGeneration, AnyModel, AnyTask, AnyX
Awesome-Diffusion-Models
A collection of resources and papers on Diffusion Models and Score-based Models, a darkhorse in the field of Generative Models
awesome-video-domain-adaptation
A comprehensive collection of awesome research and other items about video domain adaptation
Awesome_Prompting_Papers_in_Computer_Vision
A curated list of prompt-based paper in computer vision and vision-language learning.
CoOp
Prompt Learning for Vision-Language Models
CPL
Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
awesome-vision-and-language
A curated list of awesome vision and language resources (still under construction... stay tuned!)
l2p
Learning to Prompt (L2P) for Continual Learning @ CVPR22 and DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning @ ECCV22
LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
LaViLa
Code release for "Learning Video Representations from Large Language Models"
llama
Inference code for LLaMA models
LLaVA
Large Language-and-Vision Assistant built towards multimodal GPT-4 level capabilities.
LLM-in-Vision
Recent LLM-based CV and related works. Welcome to comment/contribute!
MiniGPT-4
MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models
MovieChat
🔥 chat with over 10k frames of video!
multimodal-prompt-learning
[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".
my-tools
my commonly-used tools
OT_for_big_data
Optimal Transport in the Big Data Era
stable-diffusion-videos
Create 🔥 videos with Stable Diffusion by exploring the latent space and morphing between text prompts
tomchen-ctj
Config files for my GitHub profile.
TQVSR
AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant
video-retalking
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
ViP-LLaVA
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Vita-CLIP
Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]