Xin (Eric) Wang's starred repositories
segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
generative-models
Generative Models by Stability AI
gaussian-splatting
Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"
MetaTransformer
Meta-Transformer for Unified Multimodal Learning
Multimodal-GPT
Multimodal-GPT
Neural-Network-Parameter-Diffusion
We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters
Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
swap-anything
"SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing"
Aerial-Vision-and-Dialog-Navigation
Codebase of ACL 2023 Findings "Aerial Vision-and-Dialog Navigation"
Discffusion
Official repo for the paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"
llm_coordination
Code repository for the paper "LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models"
MultipanelVQA
Code for the MultipanelVQA benchmark "Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA"
Naivgation-as-wish
Official implementation of the NAACL 2024 paper "Navigation as Attackers Wish? Towards Building Robust Embodied Agents under Federated Learning"