Zhenhailong Wang's repositories
Solo-Performance-Prompting
Repo for paper "Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration"
EEG-To-Text
code for AAAI2022 paper "Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot Sentiment Classification"
Multitask-Finetuning_CLIP
Code for paper "Rethinking Task Sampling for Few-shot Vision-Language Transfer Learning" COLING 2022 workshop
Wikinews_Pipeline
Get news from Wikipedia page's reference section
MikeWangWZHL.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
1d-tokenizer
This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation
Cutie
[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation
Grounded-Segment-Anything
Grounded-SAM: Marrying Grounding-DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
LaVIT
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
LLaVA
[NeurIPS 2023 Oral] Visual Instruction Tuning: LLaVA (Large Language-and-Vision Assistant) built towards multimodal GPT-4 level capabilities.
MathVista
MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
maze-dataset
maze datasets for investigating OOD behavior of ML systems
parti-pytorch
Implementation of Parti, Google's pure attention-based text-to-image neural network, in Pytorch
rq-vae-transformer
The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)
sam-hq
Segment Anything in High Quality [NeurIPS 2023]
self-refine
LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.
singularity
Official PyTorch code for Singularity model in the paper "Revealing Single Frame Bias for Video-and-Language Learning"
Tracking-Anything-with-DEVA
Forked from paper [ICCV 2023] Tracking Anything with Decoupled Video Segmentation
VAR
[GPT beats diffusionš„] [scaling laws in visual generationš] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
Video-ChatGPT
"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
viper
Code for the paper "ViperGPT: Visual Inference via Python Execution for Reasoning"