Chen Wu's repositories
cycle-diffusion
[ICCV 2023] A latent space for stochastic diffusion models
unified-generative-zoo
[ICCV 2023] https://arxiv.org/abs/2210.05559
generative-visual-prompt
[NeurIPS 2022] (Amortized) distributional control for pre-trained generative models
agent-attack
[ICLR 2025] Dissecting adversarial robustness of multimodal language model agents
algorithmic-creativity
[ICML 2025] Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
Point-Then-Operate
Code for the ACL 2019 paper ``A Hierarchical Reinforced Sequence Operation Method for Unsupervised Text Style Transfer``
cliport-batchify
A batched version of CLIPort: What and Where Pathways for Robotic Manipulation
Coupled-VAE
Code for the ACL 2020 paper ``On the Encoder-Decoder Incompatibility in Variational Text Modeling and Beyond``
visualwebarena
VisualWebArena is a benchmark for multimodal agents.
DeepSeek-VL
DeepSeek-VL: Towards Real-World Vision-Language Understanding
cambrian
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
prismatic-vlms
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
simpletransformers
Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI