Beast code in Giters

CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) and 3) Simplified Inference (< 8G VRAM for 1024X768 resolution).

Language:PythonNOASSERTION78900

Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Language:PythonApache-2.0243300

MuseTalk

MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting

Language:PythonNOASSERTION250900

CSGO

CSGO: Content-Style Composition in Text-to-Image Generation 🔥

Language:Jupyter Notebook22700

GeoWizard

[ECCV'24] GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image

Language:Python72100

MagicFixup

Language:PythonNOASSERTION13100

Paints-UNDO

Understand Human Behavior to Align True Needs

Language:PythonApache-2.0331300

SimpleTuner

A general fine-tuning kit geared toward diffusion models.

Language:PythonAGPL-3.0160600

MindChat

🐋MindChat（漫谈）——心理大模型：漫谈人生路, 笑对风霜途

Language:PythonGPL-3.058400

CLUECorpus2020

Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料

MIT91400

VADER

Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc.

Language:Python19700

Open-Sora-Plan

This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.

Language:PythonMIT1129200

MoneyPrinterTurbo

利用AI大模型，一键生成高清短视频 Generate short videos with one click using AI LLM.

Language:PythonMIT1633100

UniPortrait

UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalizations

Apache-2.016800

x-flux

Language:PythonApache-2.0143300

InternVideo

[ECCV2024] Video Foundation Models & Data for Multimodal Understanding

Language:PythonApache-2.0133100

MimicMotion

High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

Language:PythonNOASSERTION167700

VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Language:PythonApache-2.0184700

InvokeAI

InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, supports terminal use through a CLI, and serves as the foundation for multiple commercial products.

Language:TypeScriptApache-2.02308500

YouTaoBaBa

YouTaoBaBa's starred repositories

Euler-Smea-Dyn-Sampler

bypy

ComfyUI-FluxTrainer

HivisionIDPhotos

flash-attention

ChineseNLPCorpus

CogVLM2

RB-Modulation

CatVTON