MikeWangWZHL

Zhenhailong Wang's repositories

Solo-Performance-Prompting

Repo for paper "Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration"

Language:Python292 3 4

EEG-To-Text

code for AAAI2022 paper "Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot Sentiment Classification"

Language:Python141 9 11

VidIL

Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

Language:PythonMIT110 5 11

Paxion

Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight

Language:Python31 10

VDLM

Repo for paper: Text-based Reasoning About Vector Graphics

Language:Python16 10

Zemi

Repo for "Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks" ACL 2023 Findings

Language:Python16 4 1

Multitask-Finetuning_CLIP

Code for paper "Rethinking Task Sampling for Few-shot Vision-Language Transfer Learning" COLING 2022 workshop

Language:Python3 30

Wikinews_Pipeline

Get news from Wikipedia page's reference section

Language:Python300

MikeWangWZHL.github.io

Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes

Language:JavaScriptMIT1 10

1d-tokenizer

This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation

Apache-2.0000

Cutie

[CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation

Language:PythonMIT000

diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch

Language:PythonApache-2.0010

Grounded-Segment-Anything

Grounded-SAM: Marrying Grounding-DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything

Language:Jupyter NotebookApache-2.0000

LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence

Language:PythonBSD-3-Clause010

LaVIT

LaVIT: Empower the Large Language Model to Understand and Generate Visual Content

Language:Jupyter NotebookNOASSERTION000

LLaVA

[NeurIPS 2023 Oral] Visual Instruction Tuning: LLaVA (Large Language-and-Vision Assistant) built towards multimodal GPT-4 level capabilities.

Language:PythonApache-2.0000

MathVista

MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts

Language:Jupyter NotebookCC-BY-SA-4.0000

maze-dataset

maze datasets for investigating OOD behavior of ML systems

Language:Jupyter Notebook000

parti-pytorch

Implementation of Parti, Google's pure attention-based text-to-image neural network, in Pytorch

MIT000

rq-vae-transformer

The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)

Language:Jupyter NotebookNOASSERTION000

sam-hq

Segment Anything in High Quality [NeurIPS 2023]

Language:PythonApache-2.0000

self-refine

LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.

Apache-2.0000

singularity

Official PyTorch code for Singularity model in the paper "Revealing Single Frame Bias for Video-and-Language Learning"

MIT000

Tracking-Anything-with-DEVA

Forked from paper [ICCV 2023] Tracking Anything with Decoupled Video Segmentation

Language:PythonNOASSERTION000

[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!

Language:PythonMIT000

Video-ChatGPT

"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

CC-BY-4.0000

viper

Code for the paper "ViperGPT: Visual Inference via Python Execution for Reasoning"

Language:Jupyter NotebookNOASSERTION000

MikeWangWZHL

Zhenhailong Wang's repositories

Solo-Performance-Prompting

EEG-To-Text

VidIL

Paxion

VDLM

Zemi

Multitask-Finetuning_CLIP

Wikinews_Pipeline

cs412_project

MikeWangWZHL.github.io

1d-tokenizer

alfworld-docker-setup

Cutie

diffusers

Grounded-Segment-Anything

LAVIS

LaVIT

LLaVA

MathVista

maze-dataset

MiniGPT4-video

parti-pytorch

rq-vae-transformer

sam-hq

self-refine

singularity

Tracking-Anything-with-DEVA

VAR

Video-ChatGPT

viper