nixsui

followers

following

stars

Gray's starred repositories

pose2sim

Markerless kinematics with any cameras — From 2D Pose estimation to 3D OpenSim motion

Language:PythonBSD-3-Clause20000

LivePortrait

Bring portraits to life!

Language:PythonMIT688400

I2V-Adapter

I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models

Language:Python10100

cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Language:PythonApache-2.0155800

gptpdf

Using GPT to parse PDF

Language:PythonMIT229300

MQT-LLaVA

Matryoshka Query Transformer for Large Vision-Language Models

Language:PythonApache-2.08000

ollama

Get up and running with Llama 3, Mistral, Gemma 2, and other large language models.

Language:GoMIT7904100

MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

Language:PythonApache-2.029700

lama

🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

Language:Jupyter NotebookApache-2.0759300

ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite

Language:PythonAGPL-3.02629500

MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Language:PythonApache-2.0799900

OOTDiffusion

Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

Language:PythonNOASSERTION514800

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

Language:PythonMIT421200

MMT-Bench

ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Language:Python7200

ChartVLM

Official Repository of ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning

Language:PythonCC-BY-4.019200

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonApache-2.0310000

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Language:PythonMIT188400

grok-1

Grok open release

Language:PythonApache-2.04918100

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonApache-2.02074500

pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

Language:PythonApache-2.03081500

PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

Language:PythonApache-2.04102100

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonApache-2.01092800

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonNOASSERTION570600

OutfitAnyone

Outfit Anyone: Ultra-high quality virtual try-on for Any Clothing and Any Person

gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Language:PythonApache-2.0517000

Video-LLaVA

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Language:PythonApache-2.0270700

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonApache-2.01822300

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION436500

MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Language:PythonApache-2.0185000