vhzy

Aaron Han's repositories

ACoLP

Open Set Video HOI detection from Action-centric Chain-of-Look Prompting, ICCV2023

Language:Python000

Chat-UniVi

[CVPR 2024🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Language:PythonApache-2.0000

ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Apache-2.0000

EVA

EVA Series: Visual Representation Fantasies from BAAI

MIT000

explore-eqa

Public release for "Explore until Confident: Efficient Exploration for Embodied Question Answering"

000

flash-attention

Fast and memory-efficient exact attention

BSD-3-Clause000

Glance-Focus

This repo contains source code for Glance and Focus: Memory Prompting for Multi-Event Video Question Answering (Accepted in NeurIPS 2023)

MIT000

InvReg

Invariant Feature Regularization for Fair Face Recognition (ICCV'23)

Language:PythonMIT000

LangRepo

Language Repository for Long Video Understanding

MIT000

LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language:Jupyter NotebookBSD-3-Clause000

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning: LLaVA (Large Language-and-Vision Assistant) built towards GPT-4V level capabilities.

Language:PythonApache-2.0000

LLM-Adapters

Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"

Language:PythonApache-2.0000

LLoVi

Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"

Language:PythonMIT000

LSTP-Chat

A Video Chat Agent with Temporal Prior

MIT000

MA-LMM

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

MIT000

memorizing-transformers-pytorch

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

MIT000

mm-cot

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)

Language:PythonApache-2.0000

MovieChat

[CVPR 2024] 🎬💭 chat with over 10K frames of video!

BSD-3-Clause000

NExT-GQA

Can I Trust Your Answer? Visually Grounded VideoQA (Accepted to CVPR'24)

Language:PythonMIT000

self-rag

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.

MIT000

SeViLA

Self-Chained Image-Language Model for Video Localization and Question Answering

Language:PythonBSD-3-Clause000

StatisticalLearning_USTC

Statistical Learning course in USTC. 中科大统计学习（刘东）课程复习资料。

Language:TeX000

"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Language:PythonCC-BY-4.0000

Aaron Han's repositories

ACoLP

Chat-UniVi

ChatGLM-6B

CogCoM

CVPR24Track-LongVideo

EVA

explore-eqa

flash-attention

Glance-Focus

InvReg

Koala-video-llm

LangRepo

LAVIS

LLaVA

LLM-Adapters

LLoVi

LSTP-Chat

MA-LMM

markdown-notes

memorizing-transformers-pytorch

mm-cot

MovieChat

NExT-GQA

self-rag

SeViLA

StatisticalLearning_USTC

Video-ChatGPT

VideoTree

VidToMe

VTimeLLM