kimx3966

Daniel Y.T. Kim's repositories

Adv-Diffusion

[AAAI-2024] Official code for work "Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model"

Language:Python000

AlignProp uses direct reward backpropogation for the alignment of large-scale text-to-image diffusion models. Our method is 25x more sample and compute efficient than reinforcement learning methods (PPO) for finetuning Stable Diffusion

MIT000

animate-your-word

Apache-2.0000

cog-sdxl-clip-interrogator

Attempt at cog wrapper for a SDXL CLIP Interrogator

000

CoMat

Official code for 💫CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

000

continuous_3d_words_code

000

CSD

MIT000

DGInStyle

DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control

Apache-2.0000

DragAPart

Official Implementation of DragAPart: Learning a Part-Level Motion Prior for Articulated Objects.

000

ELLA

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

Apache-2.0000

Face-Adapter

000

Face2Diffusion

[CVPR 2024] Face2Diffusion for Fast and Editable Face Personalization https://arxiv.org/abs/2403.05094

NOASSERTION000

graph-of-thoughts

Official Implementation of "Graph of Thoughts: Solving Elaborate Problems with Large Language Models"

NOASSERTION000

HeadStudio

HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting.

MIT000

imagetoprompt-ai

Turn your images into detailed and descriptive text prompts with AI

MIT000

langchain-kr

LangChain 공식 Document, Cookbook, 그 밖의 실용 예제를 바탕으로 작성한 한국어 튜토리얼입니다. 본 튜토리얼을 통해 LangChain을 더 쉽고 효과적으로 사용하는 방법을 배울 수 있습니다.

000

LDM-Diffusion-sem

MIT000

LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Apache-2.0000

MiniGPT-4

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)

BSD-3-Clause000

mm-cot

Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)

Apache-2.0000

MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Apache-2.0000

NLLB-200-Distilled-350M-en-ko

nllb-200 distilled 350M for English to Korean translation

NOASSERTION000

OneTrainer

OneTrainer is a one-stop solution for all your stable diffusion training needs.

AGPL-3.0000

photoswap

Official implementation of the NeurIPS 2023 paper "Photoswap: Personalized Subject Swapping in Images"

MIT000

PVA-CelebAHQ-IDI

Parallel Visual Attention (WACV 2024) and CelebAHQ Identity-Preserving Inpainting dataset repository.

000

StableDiffusion-CheatSheet

A list of StableDiffusion styles and some notes for offline use. Pure HTML, CSS and a bit of JS.

MIT000

stylus

GPL-3.0000

T-GATE

T-GATE: Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models

MIT000

tilemaker

000

VAST

Code and Model for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

MIT000