Beast code in Giters

maginahuang's starred repositories

LongVideoBench

Official Dataloader and Evaluation Scripts for LongVideoBench.

Language:Python3800

persona-hub

Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"

Language:Python64600

MM-Instruct

MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment

Language:PythonApache-2.02600

dreambench_plus

Language:PythonApache-2.06400

ml-4m

4M: Massively Multimodal Masked Modeling

Language:PythonApache-2.0144600

LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Language:PythonMIT108600

LWM

Language:PythonApache-2.0703200

taming-transformers

Taming Transformers for High-Resolution Image Synthesis

Language:Jupyter NotebookMIT559800

MotionLLM

[Arxiv-2024] MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Language:PythonNOASSERTION19100

enhancing-transformers

An unofficial implementation of both ViT-VQGAN and RQ-VAE in Pytorch

Language:PythonMIT27500

Emu

Emu Series: Generative Multimodal Models from BAAI

Language:PythonApache-2.0157600

llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Language:Jupyter NotebookMIT1156900

MiraData

Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"

Language:PythonGPL-3.030300

SuperCLUE-Role

SuperCLUE-Role中文原生角色扮演测评基准

1800

ffmpeg-build-script

The FFmpeg build script provides an easy way to build a static FFmpeg on OSX and Linux with non-free codecs included.

Language:ShellMIT99800

TransNetV2

TransNet V2: Shot Boundary Detection Neural Network

Language:PythonMIT40200

MathVerse

[ECCV 2024] Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?

Language:PythonMIT11800

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonApache-2.02384200

xtuner

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

Language:PythonApache-2.0346200

MultiInstruct

MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning

Language:PythonApache-2.013000

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

1091100

self-instruct

Aligning pretrained language models with instruction data generated by themselves.

Language:PythonApache-2.0397400

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.01841500

StableLLAVA

Official repo for StableLLAVA

Language:PythonApache-2.08900

LaVIN

[NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"

Language:Python49400

DECOLA

Code release for "Language-conditioned Detection Transformer"

Language:Python7900

Awesome-Parameter-Efficient-Transfer-Learning

A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.

MIT37600

peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Language:PythonApache-2.01525500

VLM_survey

Collection of AWESOME vision-language models for vision tasks

205100

decision-transformer

Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Language:PythonMIT227800