daoyuan98's starred repositories

stable-diffusion

A latent text-to-image diffusion model

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:66496Issues:559Issues:701

roop

one-click face swap

Language:PythonLicense:GPL-3.0Stargazers:25612Issues:241Issues:0

sd-webui-roop

roop extension for StableDiffusion web-ui

Language:PythonLicense:AGPL-3.0Stargazers:3297Issues:25Issues:278

awesome-RLHF

A curated list of reinforcement learning with human feedback resources (continually updated)

InternLM-XComposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.

Language:PythonLicense:Apache-2.0Stargazers:1682Issues:133Issues:18

viper

Code for the paper "ViperGPT: Visual Inference via Python Execution for Reasoning"

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:1633Issues:88Issues:46

tomesd

Speed up Stable Diffusion with this one simple trick!

Language:PythonLicense:MITStargazers:1231Issues:19Issues:48

LLMs_interview_notes

该仓库主要记录 大模型(LLMs) 算法工程师相关的面试题

LLM-in-Vision

Recent LLM-based CV and related works. Welcome to comment/contribute!

groundingLMM

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Language:PythonLicense:Apache-2.0Stargazers:534Issues:15Issues:16

SEED

Official implementation of SEED-LLaMA (ICLR 2024).

Language:PythonLicense:NOASSERTIONStargazers:516Issues:14Issues:42

CLIP-SAM

Experiment on combining CLIP with SAM to do open-vocabulary image segmentation.

Language:Jupyter NotebookStargazers:317Issues:6Issues:5

diffusion-rig

Code Release for DiffusionRig (CVPR 2023)

Language:PythonLicense:NOASSERTIONStargazers:251Issues:13Issues:11

LRV-Instruction

[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning

Language:PythonLicense:BSD-3-ClauseStargazers:231Issues:11Issues:22

RLHF-V

[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback

ViP-LLaVA

[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts

Language:PythonLicense:Apache-2.0Stargazers:148Issues:7Issues:11

ViT-Lens

[CVPR 2024] ViT-Lens: Towards Omni-modal Representations

Language:PythonLicense:NOASSERTIONStargazers:138Issues:9Issues:13

pvic

Official PyTorch implementation for ICCV2023 paper "Exploring Predicate Visual Context in Detecting Human-Object Interactions"

Language:PythonLicense:BSD-3-ClauseStargazers:57Issues:2Issues:13
Language:PythonLicense:Apache-2.0Stargazers:50Issues:2Issues:0

Efficient-LLM-Survey

The Efficiency Spectrum of LLM

MMVP-motion-matrix-based-video-prediction

This is the official repo of MMVP: motion-matrix-based video prediction (ICCV 2023)

Language:PythonLicense:MITStargazers:32Issues:0Issues:0

Skeleton-in-Context

[CVPR2024] Official implementation of the paper: Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning

Language:PythonStargazers:21Issues:2Issues:0

Symbol-LLM

Code for NeurIPS2023 Paper "Symbol-LLM: Leverage Language Models for Symbolic System in Visual Human Activity Reasoning"

Language:PythonStargazers:18Issues:1Issues:0
Language:Jupyter NotebookStargazers:16Issues:2Issues:9

PELA

PELA: Learning Parameter-Efficient Models with Low-Rank Approximation [CVPR 2024]

Language:PythonLicense:Apache-2.0Stargazers:9Issues:3Issues:1

CaesarNeRF

This repo is for CaesarNeRF: Calibrated Semantic Representation for Few-Shot Generalizable Neural Rendering.

Language:PythonStargazers:7Issues:4Issues:0