Tongjia's repositories

OST

【CVPR'24】OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition

Language:PythonLicense:MITStargazers:29Issues:1Issues:0

CVPR23-LOVEU-AQTC

【CVPRW'23】First Place Solution to the CVPR'2023 AQTC Challenge

Language:PythonStargazers:15Issues:0Issues:0

adapt-image-models

[ICLR'23] AIM: Adapting Image Models for Efficient Video Understanding

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

Ask-Anything

ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

Awesome-Anything

General AI methods for Anything: AnyObject, AnyGeneration, AnyModel, AnyTask, AnyX

Stargazers:0Issues:0Issues:0

Awesome-Diffusion-Models

A collection of resources and papers on Diffusion Models and Score-based Models, a darkhorse in the field of Generative Models

License:MITStargazers:0Issues:0Issues:0

awesome-video-domain-adaptation

A comprehensive collection of awesome research and other items about video domain adaptation

License:MITStargazers:0Issues:0Issues:0

Awesome_Prompting_Papers_in_Computer_Vision

A curated list of prompt-based paper in computer vision and vision-language learning.

Stargazers:0Issues:0Issues:0

CoOp

Prompt Learning for Vision-Language Models

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

CPL

Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.

Stargazers:0Issues:0Issues:0

awesome-vision-and-language

A curated list of awesome vision and language resources (still under construction... stay tuned!)

Stargazers:0Issues:0Issues:0

l2p

Learning to Prompt (L2P) for Continual Learning @ CVPR22 and DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning @ ECCV22

License:Apache-2.0Stargazers:0Issues:0Issues:0

LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

LaViLa

Code release for "Learning Video Representations from Large Language Models"

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

llama

Inference code for LLaMA models

License:NOASSERTIONStargazers:0Issues:0Issues:0

LLaVA

Large Language-and-Vision Assistant built towards multimodal GPT-4 level capabilities.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

LLM-in-Vision

Recent LLM-based CV and related works. Welcome to comment/contribute!

Stargazers:0Issues:0Issues:0

MiniGPT-4

MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

MovieChat

🔥 chat with over 10k frames of video!

Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

multimodal-prompt-learning

[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".

License:MITStargazers:0Issues:0Issues:0

my-tools

my commonly-used tools

Stargazers:0Issues:0Issues:0

OT_for_big_data

Optimal Transport in the Big Data Era

Stargazers:0Issues:0Issues:0

stable-diffusion-videos

Create 🔥 videos with Stable Diffusion by exploring the latent space and morphing between text prompts

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

tomchen-ctj

Config files for my GitHub profile.

Stargazers:0Issues:1Issues:0

TQVSR

AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant

License:MITStargazers:0Issues:0Issues:0

video-retalking

[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

License:Apache-2.0Stargazers:0Issues:0Issues:0

ViP-LLaVA

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

License:Apache-2.0Stargazers:0Issues:0Issues:0

Vita-CLIP

Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]

Language:PythonLicense:MITStargazers:0Issues:0Issues:0