minjoong507

Minjoon Jung's starred repositories

llama

Inference code for Llama models

Language:PythonNOASSERTION55851 522 963

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.019599 158 1497

LLaMA-Adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters

Language:PythonGPL-3.05707 78 142

Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Language:PythonMIT3007 36 226

Chain-of-ThoughtsPapers

A trend starts from "Chain of Thought Prompting Elicits Reasoning in Large Language Models".

1927 49 3

viper

Code for the paper "ViperGPT: Visual Inference via Python Execution for Reasoning"

Language:Jupyter NotebookNOASSERTION1653 89 47

Awesome-LLMs-for-Video-Understanding

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

1343 38 4

GenAI_LLM_timeline

ChatGPT, GenerativeAI and LLMs Timeline

941 84 4

VideoMamba

[ECCV2024] VideoMamba: State Space Model for Efficient Video Understanding

Language:PythonApache-2.0797 12 89

LaViLa

Code release for "Learning Video Representations from Large Language Models"

Language:PythonMIT481 8 35

actionformer_release

Code release for ActionFormer (ECCV 2022)

Language:PythonMIT423 10 133

TimeChat

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Language:PythonBSD-3-Clause274 5 45

vision-language-models-are-bows

Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023

Language:PythonMIT243 8 37

VTimeLLM

[CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".

Language:PythonNOASSERTION208 2 35

CGDETR

Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grounding"

Language:PythonNOASSERTION110 5 19

VTG-GPT

VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT

Language:PythonMIT70 2 2

videocon

Language:PythonMIT53 3 1

NExT-GQA

Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)

Language:PythonMIT52 1 6

caption_contest_corpus

Corpus to accompany: "Do Androids Laugh at Electric Sheep? Humor "Understanding" Benchmarks from The New Yorker Caption Contest"

Language:PythonMIT49 20

dan-visdial

✨ Official PyTorch Implementation for EMNLP'19 Paper, "Dual Attention Networks for Visual Reference Resolution in Visual Dialog"

Language:PythonMIT45 5 6

Momentor

Language:Python43 7 9

FreeVA

FreeVA: Offline MLLM as Training-Free Video Assistant

Language:PythonApache-2.043 2 7

esper

ESPER

Language:Python22 1 4

gst-visdial

:speech_balloon: Official PyTorch Implementation for CVPR'23 Paper, "The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training"

Language:PythonMIT18 3 4

sglkt-visdial

🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"

Language:PythonMIT13 5 1

BM-DETR

[WACV 2025] Official Pytorch code for "Background-aware Moment Detection for Video Moment Retrieval"

Language:PythonMIT12 3 5

SelecMix

SelecMix: Debiased Learning by Contradicting-pair Sampling (NeurIPS 2022)

Language:PythonMIT11 20

Fine-Grained-Causal-RL

Fine-Grained Causal Dynamics Learning with Quantization for Improving Robustness in Reinforcement Learning (ICML 2024)

Language:PythonMIT9 20

MPGN

[EMNLP 2022] Official Pytorch code for "Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval"

Language:PythonMIT7 2 2

GVCCI

[IROS 2023] GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation

Language:Python6 2 2