tomchen-ctj

Tongjia's repositories

OST

【CVPR'24】OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition

Language:PythonMIT32 4 3

CVPR23-LOVEU-AQTC

【CVPRW'23】First Place Solution to the CVPR'2023 AQTC Challenge

Language:Python15 20

tomchen-ctj.github.io

Language:HTML2 10

adapt-image-models

[ICLR'23] AIM: Adapting Image Models for Efficient Video Understanding

Language:PythonApache-2.0000

Ask-Anything

ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Language:PythonMIT000

Awesome-Anything

General AI methods for Anything: AnyObject, AnyGeneration, AnyModel, AnyTask, AnyX

000

Awesome-Diffusion-Models

A collection of resources and papers on Diffusion Models and Score-based Models, a darkhorse in the field of Generative Models

MIT000

awesome-video-domain-adaptation

A comprehensive collection of awesome research and other items about video domain adaptation

MIT000

Awesome_Prompting_Papers_in_Computer_Vision

A curated list of prompt-based paper in computer vision and vision-language learning.

000

CoOp

Prompt Learning for Vision-Language Models

Language:PythonMIT000

CPL

Official implementation of our EMNLP 2022 paper "CPL: Counterfactual Prompt Learning for Vision and Language Models"

Language:PythonMIT000

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.

000

awesome-vision-and-language

A curated list of awesome vision and language resources (still under construction... stay tuned!)

000

l2p

Learning to Prompt (L2P) for Continual Learning @ CVPR22 and DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning @ ECCV22

Language:PythonApache-2.0000

LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Language:PythonMIT000

LaViLa

Code release for "Learning Video Representations from Large Language Models"

Language:PythonMIT000

llama

Inference code for LLaMA models

NOASSERTION000

LLaVA

Large Language-and-Vision Assistant built towards multimodal GPT-4 level capabilities.

Language:PythonApache-2.0000

LLM-in-Vision

Recent LLM-based CV and related works. Welcome to comment/contribute!

000

MiniGPT-4

MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models

Language:PythonBSD-3-Clause000

MovieChat

🔥 chat with over 10k frames of video!

Language:PythonBSD-3-Clause000

multimodal-prompt-learning

[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".

Language:PythonMIT000

my-tools

my commonly-used tools

Language:Jupyter Notebook000

OT_for_big_data

Optimal Transport in the Big Data Era

000

stable-diffusion-videos

Create 🔥 videos with Stable Diffusion by exploring the latent space and morphing between text prompts

Language:PythonApache-2.0000

tomchen-ctj

Config files for my GitHub profile.

010

TQVSR

AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant

Language:PythonMIT000

video-retalking

[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

Apache-2.0000

ViP-LLaVA

ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Language:PythonApache-2.0000

Vita-CLIP

Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]

Language:PythonMIT000