Beast code in Giters

DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.

Language:C++Apache-2.011400

florence2-finetuning

Quick exploration into fine tuning florence 2

Language:Jupyter NotebookMIT19000

chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Language:PythonNOASSERTION152000

t2v-turbo

Code repository for T2V-Turbo

Language:Python14100

ConvBench

ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models

Language:Python100

JT-VL-Chat

Language:Python200

WeMM

Language:PythonApache-2.07400

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

1067500

OmniCorpus

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

18500

stable-audio-tools

Generative models for conditional audio generation

Language:PythonMIT230300

LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Language:PythonMIT103100

PLLaVA

Official repository for the paper PLLaVA

Language:Python48600

DiffusionDPO

Code for "Diffusion Model Alignment Using Direct Preference Optimization"

Language:PythonApache-2.018300

MMT-Bench

ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Language:Python7200

RectifiedFlow

Official Implementation of Rectified Flow (ICLR2023 Spotlight)

Language:Python69200

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Language:PythonApache-2.0336700

Video-MME

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

30400

flash-diffusion

Official implementation of ⚡ Flash Diffusion ⚡: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

Language:PythonNOASSERTION38900

vision-lstm

xLSTM as Generic Vision Backbone

Language:PythonAGPL-3.033700

build-nanogpt

Video+code lecture on building nanoGPT from scratch

Language:Python299800

masa

Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anything

Language:PythonApache-2.085300

3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Language:PythonApache-2.094500

Omost

Your image is almost there!

Language:PythonApache-2.0689800

seed-tts-eval

Language:Python80400