Aaron Han's repositories

ACoLP

Open Set Video HOI detection from Action-centric Chain-of-Look Prompting, ICCV2023

Language:PythonStargazers:0Issues:0Issues:0

Chat-UniVi

[CVPR 2024🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

License:Apache-2.0Stargazers:0Issues:0Issues:0
License:NOASSERTIONStargazers:0Issues:0Issues:0
License:BSD-3-ClauseStargazers:0Issues:0Issues:0

EVA

EVA Series: Visual Representation Fantasies from BAAI

License:MITStargazers:0Issues:0Issues:0

explore-eqa

Public release for "Explore until Confident: Efficient Exploration for Embodied Question Answering"

Stargazers:0Issues:0Issues:0

flash-attention

Fast and memory-efficient exact attention

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

Glance-Focus

This repo contains source code for Glance and Focus: Memory Prompting for Multi-Event Video Question Answering (Accepted in NeurIPS 2023)

License:MITStargazers:0Issues:0Issues:0

InvReg

Invariant Feature Regularization for Fair Face Recognition (ICCV'23)

Language:PythonLicense:MITStargazers:0Issues:0Issues:0
License:BSD-3-ClauseStargazers:0Issues:0Issues:0

LangRepo

Language Repository for Long Video Understanding

License:MITStargazers:0Issues:0Issues:0

LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language:Jupyter NotebookLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning: LLaVA (Large Language-and-Vision Assistant) built towards GPT-4V level capabilities.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

LLM-Adapters

Code for our EMNLP 2023 Paper: "LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models"

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

LLoVi

Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

LSTP-Chat

A Video Chat Agent with Temporal Prior

License:MITStargazers:0Issues:0Issues:0

MA-LMM

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

License:MITStargazers:0Issues:0Issues:0
Stargazers:0Issues:1Issues:0

memorizing-transformers-pytorch

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

License:MITStargazers:0Issues:0Issues:0

mm-cot

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

MovieChat

[CVPR 2024] 🎬💭 chat with over 10K frames of video!

License:BSD-3-ClauseStargazers:0Issues:0Issues:0

NExT-GQA

Can I Trust Your Answer? Visually Grounded VideoQA (Accepted to CVPR'24)

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

self-rag

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.

License:MITStargazers:0Issues:0Issues:0

SeViLA

Self-Chained Image-Language Model for Video Localization and Question Answering

Language:PythonLicense:BSD-3-ClauseStargazers:0Issues:0Issues:0

StatisticalLearning_USTC

Statistical Learning course in USTC. 中科大统计学习(刘东)课程复习资料。

Language:TeXStargazers:0Issues:0Issues:0

Video-ChatGPT

"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

Language:PythonLicense:CC-BY-4.0Stargazers:0Issues:0Issues:0

VideoTree

Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"

License:MITStargazers:0Issues:0Issues:0

VidToMe

Official Pytorch Implementation for "VidToMe: Video Token Merging for Zero-Shot Video Editing" (CVPR 2024)

Stargazers:0Issues:0Issues:0

VTimeLLM

Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0