Pass-O-Guava's starred repositories

dash-infer

DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.

Language:C++License:Apache-2.0Stargazers:93Issues:0Issues:0

florence2-finetuning

Quick exploration into fine tuning florence 2

Language:Jupyter NotebookLicense:MITStargazers:75Issues:0Issues:0

chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.

Language:PythonLicense:NOASSERTIONStargazers:1352Issues:0Issues:0

t2v-turbo

Code repository for T2V-Turbo

Language:PythonStargazers:123Issues:0Issues:0

ConvBench

ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Ablation Capability for Large Vision-Language Models

Stargazers:1Issues:0Issues:0
Language:PythonStargazers:2Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:69Issues:0Issues:0

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

Stargazers:10297Issues:0Issues:0

OmniCorpus

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

Stargazers:152Issues:0Issues:0

stable-audio-tools

Generative models for conditional audio generation

Language:PythonLicense:MITStargazers:2216Issues:0Issues:0

LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Language:PythonLicense:MITStargazers:865Issues:0Issues:0

PLLaVA

Official repository for the paper PLLaVA

Language:PythonStargazers:456Issues:0Issues:0

DiffusionDPO

Code for "Diffusion Model Alignment Using Direct Preference Optimization"

Language:PythonLicense:Apache-2.0Stargazers:160Issues:0Issues:0

MMT-Bench

ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Language:PythonStargazers:63Issues:0Issues:0

RectifiedFlow

Official Implementation of Rectified Flow (ICLR2023 Spotlight)

Language:PythonStargazers:663Issues:0Issues:0

lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Language:PythonLicense:Apache-2.0Stargazers:3110Issues:0Issues:0

Video-MME

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Stargazers:276Issues:0Issues:0

flash-diffusion

Official implementation of ⚡ Flash Diffusion ⚡: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

Language:PythonLicense:NOASSERTIONStargazers:296Issues:0Issues:0

vision-lstm

xLSTM as Generic Vision Backbone

Language:PythonLicense:AGPL-3.0Stargazers:302Issues:0Issues:0

build-nanogpt

Video+code lecture on building nanoGPT from scratch

Language:PythonStargazers:2730Issues:0Issues:0

masa

Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anything

Language:PythonLicense:Apache-2.0Stargazers:783Issues:0Issues:0

3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization

Language:PythonLicense:Apache-2.0Stargazers:897Issues:0Issues:0

Omost

Your image is almost there!

Language:PythonLicense:Apache-2.0Stargazers:6670Issues:0Issues:0
Language:PythonStargazers:730Issues:0Issues:0
Language:Jupyter NotebookLicense:NOASSERTIONStargazers:379Issues:0Issues:0

TableMASTER-mmocr

2nd solution of ICDAR 2021 Competition on Scientific Literature Parsing, Task B.

Language:PythonLicense:Apache-2.0Stargazers:407Issues:0Issues:0

MuTabNet

ICDAR 2024 Table OCR Model

Language:PythonLicense:MITStargazers:5Issues:0Issues:0

SEMv3

The official PyTorch implementation of SEMv3.

Language:PythonLicense:Apache-2.0Stargazers:13Issues:0Issues:0

MTL-TabNet

MTL-TabNet: Multi-task Learning based Model for Image-based Table Recognition

Language:PythonLicense:Apache-2.0Stargazers:79Issues:0Issues:0

TableStructureRec

整理目前开源的表格识别模型,完善前后处理,模型转换为ONNX

Language:PythonLicense:Apache-2.0Stargazers:89Issues:0Issues:0