xyxxmb's starred repositories

GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Language:PythonLicense:Apache-2.0Stargazers:3598Issues:0Issues:0

PAE

[CVPR 2024] Dynamic Prompt Optimizing for Text-to-Image Generation

Language:PythonStargazers:45Issues:0Issues:0

Omost

Your image is almost there!

Language:PythonLicense:Apache-2.0Stargazers:6880Issues:0Issues:0

aesthetic-predictor-v2-5

SigLIP-based Aesthetic Score Predictor

Language:PythonLicense:AGPL-3.0Stargazers:81Issues:0Issues:0

Pandora

Pandora: Towards General World Model with Natural Language Actions and Video States

Language:PythonStargazers:431Issues:0Issues:0
Language:PythonLicense:NOASSERTIONStargazers:256Issues:0Issues:0

CraftsMan

CraftsMan: High-fidelity Mesh Generation with 3D Native Diffusion and Interactive Geometry Refiner

Language:PythonStargazers:331Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:338Issues:0Issues:0

CLoT

CVPR'24, Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation".

Language:PythonStargazers:260Issues:0Issues:0

MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Language:PythonLicense:Apache-2.0Stargazers:7911Issues:0Issues:0

HiDiffusion

[ECCV 2024] HiDiffusion: Increases the resolution and speed of your diffusion model by only adding a single line of code!

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:684Issues:0Issues:0

Glyph-ByT5

[ECCV2024] This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering""

License:Apache-2.0Stargazers:391Issues:0Issues:0

llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Language:Jupyter NotebookLicense:MITStargazers:11119Issues:0Issues:0

MiniCPM

MiniCPM-2B: An end-side LLM outperforming Llama2-13B.

Language:PythonLicense:Apache-2.0Stargazers:4412Issues:0Issues:0

MoMA

MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation

Language:Jupyter NotebookStargazers:160Issues:0Issues:0

HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Language:PythonLicense:NOASSERTIONStargazers:2728Issues:0Issues:0

PuLID_ComfyUI

PuLID native implementation for ComfyUI

Language:PythonLicense:Apache-2.0Stargazers:436Issues:0Issues:0
Language:Jupyter NotebookLicense:MITStargazers:114Issues:0Issues:0
Language:PythonStargazers:82Issues:0Issues:0

MagicDance

[ICML 2024] MagicPose(also known as MagicDance): Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion

Language:PythonLicense:NOASSERTIONStargazers:620Issues:0Issues:0

StoryDiffusion

Create Magic Story!

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:5457Issues:0Issues:0

Vitron

A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Language:PythonStargazers:244Issues:0Issues:0

comfyui-portrait-master-zh-cn

肖像大师 中文版 comfyui-portrait-master

Language:PythonLicense:GPL-3.0Stargazers:1498Issues:0Issues:0

Fooocus

Focus on prompting and generating

Language:PythonLicense:GPL-3.0Stargazers:38168Issues:0Issues:0

maia

Official implementation of MAIA, A Multimodal Automated Interpretability Agent

Language:PythonStargazers:39Issues:0Issues:0

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

Stargazers:10548Issues:0Issues:0

ComfyUI-YoloWorld-EfficientSAM

Unofficial implementation of YOLO-World + EfficientSAM for ComfyUI

Language:PythonLicense:GPL-3.0Stargazers:490Issues:0Issues:0

Autonomous-Agents

Autonomous Agents (LLMs) research papers. Updated Daily.

License:MITStargazers:270Issues:0Issues:0

Grounded-Segment-Anything

Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:14142Issues:0Issues:0

SEED-X

Multimodal Models in Real World

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:299Issues:0Issues:0