Beast code in Giters

maginahuang's starred repositories

VILA

VILA - A multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Language:PythonApache-2.020400

ml-4m

4M: Massively Multimodal Masked Modeling

Language:PythonApache-2.045800

LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Language:PythonMIT80300

LWM

Language:PythonApache-2.0696900

taming-transformers

Taming Transformers for High-Resolution Image Synthesis

Language:Jupyter NotebookMIT550900

MotionLLM

[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Language:PythonNOASSERTION16700

enhancing-transformers

An unofficial implementation of both ViT-VQGAN and RQ-VAE in Pytorch

Language:PythonMIT26300

Emu

Emu Series: Generative Multimodal Models from BAAI

Language:PythonApache-2.0155100

llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Language:Jupyter NotebookMIT1058900

MiraData

Language:PythonNOASSERTION16800

SuperCLUE-Role

SuperCLUE-Role中文原生角色扮演测评基准

1300

ffmpeg-build-script

The FFmpeg build script provides an easy way to build a static FFmpeg on OSX and Linux with non-free codecs included.

Language:ShellMIT97900

TransNetV2

TransNet V2: Shot Boundary Detection Neural Network

Language:PythonMIT37600

MathVerse

Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Language:PythonMIT11100

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.02147400

xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Language:PythonApache-2.0308400

MultiInstruct

MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning

Language:PythonApache-2.012600

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

1016100

self-instruct

Aligning pretrained language models with instruction data generated by themselves.

Language:PythonApache-2.0390100

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.01763400

StableLLAVA

Official repo for StableLLAVA

Language:PythonApache-2.08600

LaVIN

[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"

Language:Python49100

DECOLA

Code release for "Language-conditioned Detection Transformer"

Language:Python7700

Awesome-Parameter-Efficient-Transfer-Learning

A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.

MIT36000

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Language:PythonApache-2.01475000

VLM_survey

Collection of AWESOME vision-language models for vision tasks

191800

decision-transformer

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Language:PythonMIT222900

FlagEmbedding

Retrieval and Retrieval-augmented LLMs

Language:PythonMIT572900

LURE

[ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models

Language:Python11600

FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Language:PythonApache-2.03539100