Zihan Wang's repositories
AlphaCLIP
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
awesome-segment-anything
Tracking and collecting papers/projects/others related to Segment Anything.
BLIP
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
dynibar
Implementation of DynIBaR Neural Dynamic Image-Based Rendering (CVPR 2023)
LAVIS
LAVIS - A One-stop Library for Language-Vision Intelligence
LISA
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
llama
Inference code for LLaMA models
Metaworld
Collections of robotics environments geared towards benchmarking multi-task and meta reinforcement learning
MONAI
AI Toolkit for Healthcare Imaging
Neural-Scene-Flow-Fields
PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes"
open_clip
An open source implementation of CLIP.
projectaria_tools
projectaria_tools is an C++/Python open-source toolkit to interact with Project Aria data
prompt-dt
Official code repository for Prompt-DT.
pypose
To connect classic robotics with modern learning methods seamlessly.
VIMA
Official Algorithm Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"
vision-language-models-are-bows
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR 2023
visprog
Official code for VisProg (CVPR 2023 Best Paper!)
visual_gpt_score
VisualGPTScore for visio-linguistic reasoning