maginahuang's starred repositories

LongVideoBench

Official Dataloader and Evaluation Scripts for LongVideoBench.

Language:PythonStargazers:38Issues:0Issues:0

persona-hub

Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"

Language:PythonStargazers:646Issues:0Issues:0

MM-Instruct

MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment

Language:PythonLicense:Apache-2.0Stargazers:26Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:64Issues:0Issues:0

ml-4m

4M: Massively Multimodal Masked Modeling

Language:PythonLicense:Apache-2.0Stargazers:1446Issues:0Issues:0

LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Language:PythonLicense:MITStargazers:1086Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:7032Issues:0Issues:0

taming-transformers

Taming Transformers for High-Resolution Image Synthesis

Language:Jupyter NotebookLicense:MITStargazers:5598Issues:0Issues:0

MotionLLM

[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Language:PythonLicense:NOASSERTIONStargazers:191Issues:0Issues:0

enhancing-transformers

An unofficial implementation of both ViT-VQGAN and RQ-VAE in Pytorch

Language:PythonLicense:MITStargazers:275Issues:0Issues:0

Emu

Emu Series: Generative Multimodal Models from BAAI

Language:PythonLicense:Apache-2.0Stargazers:1576Issues:0Issues:0

llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Language:Jupyter NotebookLicense:MITStargazers:11569Issues:0Issues:0

MiraData

Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"

Language:PythonLicense:GPL-3.0Stargazers:303Issues:0Issues:0

SuperCLUE-Role

SuperCLUE-Role中文原生角色扮演测评基准

Stargazers:18Issues:0Issues:0

ffmpeg-build-script

The FFmpeg build script provides an easy way to build a static FFmpeg on OSX and Linux with non-free codecs included.

Language:ShellLicense:MITStargazers:998Issues:0Issues:0

TransNetV2

TransNet V2: Shot Boundary Detection Neural Network

Language:PythonLicense:MITStargazers:402Issues:0Issues:0

MathVerse

[ECCV 2024] Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Language:PythonLicense:MITStargazers:118Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:23842Issues:0Issues:0

xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Language:PythonLicense:Apache-2.0Stargazers:3462Issues:0Issues:0

MultiInstruct

MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning

Language:PythonLicense:Apache-2.0Stargazers:130Issues:0Issues:0

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

Stargazers:10911Issues:0Issues:0

self-instruct

Aligning pretrained language models with instruction data generated by themselves.

Language:PythonLicense:Apache-2.0Stargazers:3974Issues:0Issues:0

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonLicense:Apache-2.0Stargazers:18415Issues:0Issues:0

StableLLAVA

Official repo for StableLLAVA

Language:PythonLicense:Apache-2.0Stargazers:89Issues:0Issues:0

LaVIN

[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"

Language:PythonStargazers:494Issues:0Issues:0

DECOLA

Code release for "Language-conditioned Detection Transformer"

Language:PythonStargazers:79Issues:0Issues:0

Awesome-Parameter-Efficient-Transfer-Learning

A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.

License:MITStargazers:376Issues:0Issues:0

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Language:PythonLicense:Apache-2.0Stargazers:15255Issues:0Issues:0

VLM_survey

Collection of AWESOME vision-language models for vision tasks

Stargazers:2051Issues:0Issues:0

decision-transformer

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Language:PythonLicense:MITStargazers:2278Issues:0Issues:0