eric-xw

Xin (Eric) Wang's starred repositories

segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

Language:Jupyter NotebookApache-2.045983 303 658

llama3

The official Meta Llama 3 GitHub site

Language:PythonNOASSERTION25121 206 215

generative-models

Generative Models by Stability AI

Language:PythonMIT23635 251 289

gaussian-splatting

Original reference implementation of "3D Gaussian Splatting for Real-Time Radiance Field Rendering"

Language:PythonNOASSERTION13031 112 856

SWE-agent

SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It solves 12.47% of bugs in the SWE-bench evaluation set and takes just 1 minute to run.

Language:PythonMIT12181 87 342

MemGPT

Create LLM agents with long-term memory and custom tools 📚🦙

Language:PythonApache-2.011006 113 684

Voyager

An Open-Ended Embodied Agent with Large Language Models

Language:JavaScriptMIT5411 62 143

OpenAGI

OpenAGI: When LLM Meets Domain Experts

Language:PythonMIT1870 27 16

MetaTransformer

Meta-Transformer for Unified Multimodal Learning

Language:PythonApache-2.01476 22 65

Neural-Network-Parameter-Diffusion

We introduce a novel approach for parameter generation, named neural network parameter diffusion (p-diff), which employs a standard latent diffusion model to synthesize a new set of parameters

Language:Python795 19 20

AWSIM

Open source simulator for self-driving vehicles

Language:C#NOASSERTION483 55 96

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

Language:Python424 6 21

swap-anything

"SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing"

197 31 1

MatrixCity

Language:PythonApache-2.0187 10 37

LLMScore

LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation

Language:Python119 3 7

TIP

Multimodal-Procedural-Planning

Language:Python90 3 4

Aerial-Vision-and-Dialog-Navigation

Codebase of ACL 2023 Findings "Aerial Vision-and-Dialog Navigation"

Language:Python32 2 13

ComCLIP

Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"

Language:PythonMIT27 30

Discffusion

Official repo for the paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"

Language:PythonMIT25 20

llm_coordination

Code repository for the paper "LLM-Coordination: Evaluating and Analyzing Multi-agent Coordination Abilities in Large Language Models"

Language:PythonMIT18 40

MMWorld

Official repo of the paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"

Language:PythonMIT16 10

ProbMed

"Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA"

Language:Python11 10

T2IAT

T2IAT: Measuring Valence and Stereotypical Biases in Text-to-Image Generation

Language:PythonMIT7 40

MultipanelVQA

Code for the MultipanelVQA benchmark "Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA"

Language:Jupyter NotebookMIT6 30

PECTVLM

Code implementation for Findings of EMNLP 2023 paper "Parameter-Efficient Cross-lingual Transfer of Vision and Language Models via Translation-based Alignment"

Language:SmalltalkMIT6 30

Naivgation-as-wish

Official implementation of the NAACL 2024 paper "Navigation as Attackers Wish? Towards Building Robust Embodied Agents under Federated Learning"

Language:PythonMIT4 20

R2H

Official implementation of the EMNLP 2023 paper "R2H: Building Multimodal Navigation Helpers that Respond to Help Requests"

Language:Python4 30