Aaron Han's repositories

ACoLP

Open Set Video HOI detection from Action-centric Chain-of-Look Prompting, ICCV2023

Language:PythonStargazers:0Issues:0Issues:0

AU-Net

Towards robust facial action units detection

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

Chat-UniVi

[CVPR 2024🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

ChatCaptioner

Official Repository of ChatCaptioner

Language:Jupyter NotebookLicense:MITStargazers:0Issues:0Issues:0

ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

License:Apache-2.0Stargazers:0Issues:0Issues:0

ChineseNMT

ChineseNMT: Translate English to Chinese with PyTorch Implementation of Transformer

Stargazers:0Issues:0Issues:0

DL_GCN

手写了卷积神经网络内核,来处理图上的节点分类与链路预测任务,在三个数据集cora,citeseer,ppi上进行试验,并分析了自环、层数、DropEdge、PairNorm、激活函数等因素对模型的分类和预测性能的影响。

Language:PythonStargazers:0Issues:0Issues:0

Emotion-Investigator

An Exciting Deep Learning-based Flask web app that predicts the Facial Expressions of users and also does Graphical Visualization of the Expressions.

Language:Jupyter NotebookLicense:MITStargazers:0Issues:0Issues:0

EVA

EVA Series: Visual Representation Fantasies from BAAI

License:MITStargazers:0Issues:0Issues:0

FastChat

The release repo for "Vicuna: An Open Chatbot Impressing GPT-4"

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

FrozenBiLM

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

Glance-Focus

This repo contains source code for Glance and Focus: Memory Prompting for Multi-Event Video Question Answering (Accepted in NeurIPS 2023)

License:MITStargazers:0Issues:0Issues:0

InvReg

Invariant Feature Regularization for Fair Face Recognition (ICCV'23)

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language:Jupyter NotebookLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning: LLaVA (Large Language-and-Vision Assistant) built towards GPT-4V level capabilities.

License:Apache-2.0Stargazers:0Issues:0Issues:0

LLM-Adapters

Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

LLoVi

Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

LSTP-Chat

A Video Chat Agent with Temporal Prior

License:MITStargazers:0Issues:0Issues:0
Stargazers:0Issues:1Issues:0

MiniGPT-4

MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models

Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

mm-cot

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

NExT-GQA

Can I Trust Your Answer? Visually Grounded VideoQA (Accepted to CVPR'24)

License:MITStargazers:0Issues:0Issues:0

prophet

Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".

License:Apache-2.0Stargazers:0Issues:0Issues:0

PSVL

Code for the paper "Zero-shot Natural Language Video Localization" (ICCV2021, Oral).

Language:PythonStargazers:0Issues:0Issues:0

SeViLA

Self-Chained Image-Language Model for Video Localization and Question Answering

Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

StatisticalLearning_USTC

Statistical Learning course in USTC. 中科大统计学习(刘东)课程复习资料。

Language:TeXStargazers:0Issues:0Issues:0

TimeChat

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding

Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

Video-ChatGPT

"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Language:PythonLicense:CC-BY-4.0Stargazers:0Issues:0Issues:0

VidToMe

Official Pytorch Implementation for "VidToMe: Video Token Merging for Zero-Shot Video Editing" (CVPR 2024)

Stargazers:0Issues:0Issues:0

VTimeLLM

Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0