maginahuang's starred repositories

VILA

VILA - A multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Language:PythonLicense:Apache-2.0Stargazers:204Issues:0Issues:0

ml-4m

4M: Massively Multimodal Masked Modeling

Language:PythonLicense:Apache-2.0Stargazers:458Issues:0Issues:0

LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Language:PythonLicense:MITStargazers:803Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:6969Issues:0Issues:0

taming-transformers

Taming Transformers for High-Resolution Image Synthesis

Language:Jupyter NotebookLicense:MITStargazers:5509Issues:0Issues:0

MotionLLM

[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Language:PythonLicense:NOASSERTIONStargazers:167Issues:0Issues:0

enhancing-transformers

An unofficial implementation of both ViT-VQGAN and RQ-VAE in Pytorch

Language:PythonLicense:MITStargazers:263Issues:0Issues:0

Emu

Emu Series: Generative Multimodal Models from BAAI

Language:PythonLicense:Apache-2.0Stargazers:1551Issues:0Issues:0

llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Language:Jupyter NotebookLicense:MITStargazers:10589Issues:0Issues:0
Language:PythonLicense:NOASSERTIONStargazers:168Issues:0Issues:0

SuperCLUE-Role

SuperCLUE-Role中文原生角色扮演测评基准

Stargazers:13Issues:0Issues:0

ffmpeg-build-script

The FFmpeg build script provides an easy way to build a static FFmpeg on OSX and Linux with non-free codecs included.

Language:ShellLicense:MITStargazers:979Issues:0Issues:0

TransNetV2

TransNet V2: Shot Boundary Detection Neural Network

Language:PythonLicense:MITStargazers:376Issues:0Issues:0

MathVerse

Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Language:PythonLicense:MITStargazers:111Issues:0Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:21474Issues:0Issues:0

xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Language:PythonLicense:Apache-2.0Stargazers:3084Issues:0Issues:0

MultiInstruct

MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning

Language:PythonLicense:Apache-2.0Stargazers:126Issues:0Issues:0

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

Stargazers:10161Issues:0Issues:0

self-instruct

Aligning pretrained language models with instruction data generated by themselves.

Language:PythonLicense:Apache-2.0Stargazers:3901Issues:0Issues:0

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonLicense:Apache-2.0Stargazers:17634Issues:0Issues:0

StableLLAVA

Official repo for StableLLAVA

Language:PythonLicense:Apache-2.0Stargazers:86Issues:0Issues:0

LaVIN

[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"

Language:PythonStargazers:491Issues:0Issues:0

DECOLA

Code release for "Language-conditioned Detection Transformer"

Language:PythonStargazers:77Issues:0Issues:0

Awesome-Parameter-Efficient-Transfer-Learning

A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.

License:MITStargazers:360Issues:0Issues:0

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Language:PythonLicense:Apache-2.0Stargazers:14750Issues:0Issues:0

VLM_survey

Collection of AWESOME vision-language models for vision tasks

Stargazers:1918Issues:0Issues:0

decision-transformer

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Language:PythonLicense:MITStargazers:2229Issues:0Issues:0

FlagEmbedding

Retrieval and Retrieval-augmented LLMs

Language:PythonLicense:MITStargazers:5729Issues:0Issues:0

LURE

[ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models

Language:PythonStargazers:116Issues:0Issues:0

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonLicense:Apache-2.0Stargazers:35391Issues:0Issues:0