felixfuu's starred repositories

DisenDiff

[CVPR`2024, Oral] Attention Calibration for Disentangled Text-to-Image Personalization

Language:PythonLicense:MITStargazers:59Issues:0Issues:0

VAR

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonLicense:MITStargazers:3745Issues:0Issues:0

InstructCV

[ ICLR 2024 ] Official Codebase for "InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists"

Language:PythonLicense:NOASSERTIONStargazers:515Issues:0Issues:0

CosmicMan

CosmicMan: A Text-to-Image Foundation Model for Humans (CVPR 2024)

Language:PythonStargazers:254Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:183Issues:0Issues:0

img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

Language:PythonLicense:MITStargazers:3414Issues:0Issues:0

torch-LLM4SGG

Official PyTorch implementation Source code for LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation, accepted at CVPR 2024

Language:PythonStargazers:64Issues:0Issues:0

FreeCustom

[CVPR 2024] Official PyTorch implementation of FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition

Language:PythonLicense:MITStargazers:58Issues:0Issues:0

T-Rex

API for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Language:PythonLicense:NOASSERTIONStargazers:1970Issues:0Issues:0

DINOv

[CVPR 2024] Official implementation of the paper "Visual In-context Learning"

Language:PythonStargazers:304Issues:0Issues:0

Osprey

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

Language:PythonLicense:Apache-2.0Stargazers:712Issues:0Issues:0

Arc2Face

Arc2Face: A Foundation Model of Human Faces

Language:PythonLicense:MITStargazers:482Issues:0Issues:0

ITI-GEN

[ICCV 2023 Oral, Best Paper Finalist] ITI-GEN: Inclusive Text-to-Image Generation

Language:PythonLicense:NOASSERTIONStargazers:57Issues:0Issues:0

Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

Language:PythonLicense:Apache-2.0Stargazers:19558Issues:0Issues:0

SEED

Official implementation of SEED-LLaMA (ICLR 2024).

Language:PythonLicense:NOASSERTIONStargazers:514Issues:0Issues:0

LLaVA-UHD

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Language:PythonStargazers:253Issues:0Issues:0

grok-1

Grok open release

Language:PythonLicense:Apache-2.0Stargazers:49129Issues:0Issues:0

eclipse-inference

[CVPR 2024] Official PyTorch implementation of "ECLIPSE: Revisiting the Text-to-Image Prior for Efficient Image Generation"

Language:PythonLicense:MITStargazers:59Issues:0Issues:0

LLM-groundedDiffusion

LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusion: LMD)

Language:PythonStargazers:372Issues:0Issues:0

Griffon

The official repo of Griffon

Language:PythonLicense:Apache-2.0Stargazers:79Issues:0Issues:0

LISA

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Language:PythonLicense:Apache-2.0Stargazers:1597Issues:0Issues:0

gill

🐟 Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:400Issues:0Issues:0

Awesome-Controllable-T2I-Diffusion-Models

A collection of resources on controllable generation with text-to-image diffusion models.

License:MITStargazers:730Issues:0Issues:0

minisora

MiniSora: A community aims to explore the implementation path and future development direction of Sora.

Language:PythonLicense:Apache-2.0Stargazers:1087Issues:0Issues:0

Hulk

An official implementation of "Hulk: A Universal Knowledge Translator for Human-Centric Tasks"

Language:PythonLicense:MITStargazers:64Issues:0Issues:0

llm-course

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:33525Issues:0Issues:0

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonLicense:Apache-2.0Stargazers:10634Issues:0Issues:0

TagAlign

Official implementation of TagAlign

Language:PythonStargazers:31Issues:0Issues:0

FastV

Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Language:PythonStargazers:151Issues:0Issues:0

Awesome-Text-to-Image

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

License:MITStargazers:1957Issues:0Issues:0