Aaron Han's repositories
ACoLP
Open Set Video HOI detection from Action-centric Chain-of-Look Prompting, ICCV2023
AU-Net
Towards robust facial action units detection
Chat-UniVi
[CVPR 2024🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
ChatCaptioner
Official Repository of ChatCaptioner
ChatGLM-6B
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
ChineseNMT
ChineseNMT: Translate English to Chinese with PyTorch Implementation of Transformer
DL_GCN
手写了卷积神经网络内核,来处理图上的节点分类与链路预测任务,在三个数据集cora,citeseer,ppi上进行试验,并分析了自环、层数、DropEdge、PairNorm、激活函数等因素对模型的分类和预测性能的影响。
Emotion-Investigator
An Exciting Deep Learning-based Flask web app that predicts the Facial Expressions of users and also does Graphical Visualization of the Expressions.
EVA
EVA Series: Visual Representation Fantasies from BAAI
FastChat
The release repo for "Vicuna: An Open Chatbot Impressing GPT-4"
FrozenBiLM
[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Glance-Focus
This repo contains source code for Glance and Focus: Memory Prompting for Multi-Event Video Question Answering (Accepted in NeurIPS 2023)
InvReg
Invariant Feature Regularization for Fair Face Recognition (ICCV'23)
LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning: LLaVA (Large Language-and-Vision Assistant) built towards GPT-4V level capabilities.
LLM-Adapters
Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"
LLoVi
Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"
LSTP-Chat
A Video Chat Agent with Temporal Prior
MiniGPT-4
MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models
mm-cot
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)
NExT-GQA
Can I Trust Your Answer? Visually Grounded VideoQA (Accepted to CVPR'24)
prophet
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
PSVL
Code for the paper "Zero-shot Natural Language Video Localization" (ICCV2021, Oral).
SeViLA
Self-Chained Image-Language Model for Video Localization and Question Answering
StatisticalLearning_USTC
Statistical Learning course in USTC. 中科大统计学习(刘东)课程复习资料。
TimeChat
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Video-ChatGPT
"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
VidToMe
Official Pytorch Implementation for "VidToMe: Video Token Merging for Zero-Shot Video Editing" (CVPR 2024)
VTimeLLM
Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".