JiahaoTian-sjtu

JiahaoTian's starred repositories

flux

Official inference repo for FLUX.1 models

Language:PythonApache-2.01566100

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

Language:PythonApache-2.04401000

GlyphDraw2

GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models

Language:PythonMIT4600

segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookApache-2.04748600

OCR-SAM

Combining MMOCR with Segment Anything & Stable Diffusion. Automatically detect, recognize and segment text instances, with serval downstream tasks, e.g., Text Removal and Text Inpainting

Language:Python53100

PixArt-alpha

PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

Language:PythonApache-2.0278200

scepter

SCEPTER is an open-source framework used for training, fine-tuning, and inference with generative models.

Language:PythonApache-2.041900

lora

Using Low-rank adaptation to quickly fine-tune diffusion models.

Language:Jupyter NotebookApache-2.0704300

Glyph-ByT5

[ECCV2024] This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering""

Language:Jupyter NotebookApache-2.050400

transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Language:PythonApache-2.013450100

pykan

Kolmogorov Arnold Networks

Language:Jupyter NotebookMIT1500000

Diffusion-Tryon-Trainer

Language:PythonNOASSERTION12200

VAR

[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonMIT421400

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

1251000

controlnet_aux

Language:PythonApache-2.039000

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonApache-2.02213700

Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Language:PythonMIT449900

AnimateAnyone-reproduction

reproduction of AnimateAnyone

Language:Python16600

modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

Language:PythonApache-2.0698200

MagicDance

[ICML 2024] MagicPose(also known as MagicDance): Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion

Language:PythonNOASSERTION69800

Open-AnimateAnyone

Unofficial Implementation of Animate Anyone

Language:Python293200

VisorGPT

[NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT

Language:PythonMIT13100

LLM-groundedDiffusion

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusion: LMD, TMLR 2024)

Language:Python43100

SLD

🔥 [CVPR2024] Official implementation of "Self-correcting LLM-controlled Diffusion Models (SLD)

Language:PythonMIT15400

Word-As-Image

Language:PythonNOASSERTION110700

DS-Fusion

Code for project DS-Fusion

Language:PythonMIT14600

unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Language:PythonMIT2010900

MIGC

[CVPR 2024 Highlight] "MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis" (Official Implementation)

Language:PythonNOASSERTION53500

InstanceDiffusion

[CVPR 2024] Code release for "InstanceDiffusion: Instance-level Control for Image Generation"

Language:PythonApache-2.050200

GLIGEN

Open-Set Grounded Text-to-Image Generation

Language:PythonMIT200600