Guangrun Wang (王广润)'s repositories
Grounded-Segment-Anything
Grounded-SAM: Marrying Grounding-DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Depth-Anything
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
AnyDoor
Official implementations for paper: Anydoor: zero-shot object-level image customization
autogen
A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap
CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
dinov2
PyTorch code and models for the DINOv2 self-supervised learning method.
DIS
This is the repo for our new project Highly Accurate Dichotomous Image Segmentation
FLatten-Transformer
Official repository of FLatten Transformer (ICCV2023)
GPT-4V-Act
AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI
humannerf
HumanNeRF turns a monocular video of moving people into a 360 free-viewpoint video.
IDM-VTON
IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild
im-server
即时通讯(IM)系统
inpaint-anything
Inpaint Anything performs stable diffusion inpainting on a browser UI using masks from Segment Anything.
IP-Adapter
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
ladi-vton
This is the official repository for the paper "LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On".
llama
Inference code for LLaMA models
LLaMA2-Accessory
An Open-source Toolkit for LLM Development
OOTDiffusion
Official implementation of OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
PeRF
[Technical Report 2023] PERF: Panoramic Neural Radiance Field from a Single Panorama
pyllama
LLaMA: Open and Efficient Foundation Language Models
pytorch-image-models-v2
PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, NFNet, Vision Transformer (ViT), MobileNet-V3/V2, RegNet, DPN, CSPNet, Swin Transformer, MaxViT, CoAtNet, ConvNeXt, and more
rcg
PyTorch implementation of RCG https://arxiv.org/abs/2312.03701
torch-ngp
A pytorch CUDA extension implementation of instant-ngp (sdf and nerf), with a GUI.
VAR
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction"
ViT-Adapter
[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions