Gray's starred repositories

pose2sim

Markerless kinematics with any cameras — From 2D Pose estimation to 3D OpenSim motion

Language:PythonLicense:BSD-3-ClauseStargazers:200Issues:0Issues:0

LivePortrait

Bring portraits to life!

Language:PythonLicense:MITStargazers:6884Issues:0Issues:0

I2V-Adapter

I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models

Language:PythonStargazers:101Issues:0Issues:0

cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Language:PythonLicense:Apache-2.0Stargazers:1558Issues:0Issues:0

gptpdf

Using GPT to parse PDF

Language:PythonLicense:MITStargazers:2293Issues:0Issues:0

MQT-LLaVA

Matryoshka Query Transformer for Large Vision-Language Models

Language:PythonLicense:Apache-2.0Stargazers:80Issues:0Issues:0

ollama

Get up and running with Llama 3, Mistral, Gemma 2, and other large language models.

Language:GoLicense:MITStargazers:79041Issues:0Issues:0

MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

Language:PythonLicense:Apache-2.0Stargazers:297Issues:0Issues:0

lama

🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:7593Issues:0Issues:0

ultralytics

NEW - YOLOv8 🚀 in PyTorch > ONNX > OpenVINO > CoreML > TFLite

Language:PythonLicense:AGPL-3.0Stargazers:26295Issues:0Issues:0

MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Language:PythonLicense:Apache-2.0Stargazers:7999Issues:0Issues:0

OOTDiffusion

Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on

Language:PythonLicense:NOASSERTIONStargazers:5148Issues:0Issues:0

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

Language:PythonLicense:MITStargazers:4212Issues:0Issues:0

MMT-Bench

ICML'2024 | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

Language:PythonStargazers:72Issues:0Issues:0

ChartVLM

Official Repository of ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning

Language:PythonLicense:CC-BY-4.0Stargazers:192Issues:0Issues:0

MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Language:PythonLicense:Apache-2.0Stargazers:3100Issues:0Issues:0

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

Stargazers:10714Issues:0Issues:0

DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Language:PythonLicense:MITStargazers:1884Issues:0Issues:0

grok-1

Grok open release

Language:PythonLicense:Apache-2.0Stargazers:49181Issues:0Issues:0

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonLicense:Apache-2.0Stargazers:20745Issues:0Issues:0

pytorch-image-models

The largest collection of PyTorch image encoders / backbones. Including train, eval, inference, export scripts, and pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNetV4, MobileNet-V3 & V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more

Language:PythonLicense:Apache-2.0Stargazers:30815Issues:0Issues:0

PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

Language:PythonLicense:Apache-2.0Stargazers:41021Issues:0Issues:0

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonLicense:Apache-2.0Stargazers:10928Issues:0Issues:0

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonLicense:NOASSERTIONStargazers:5706Issues:0Issues:0

OutfitAnyone

Outfit Anyone: Ultra-high quality virtual try-on for Any Clothing and Any Person

Stargazers:5298Issues:0Issues:0

gemma_pytorch

The official PyTorch implementation of Google's Gemma models

Language:PythonLicense:Apache-2.0Stargazers:5170Issues:0Issues:0

Video-LLaVA

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Language:PythonLicense:Apache-2.0Stargazers:2707Issues:0Issues:0

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Language:PythonLicense:Apache-2.0Stargazers:18223Issues:0Issues:0

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:4365Issues:0Issues:0

MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Language:PythonLicense:Apache-2.0Stargazers:1850Issues:0Issues:0