zhangtao's repositories
bubogpt
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs
DragGAN
Code for DragGAN (SIGGRAPH 2023)
fc-clip
This repo contains the code for our paper Convolutions Die Hard: Open-Vocabulary Panoptic Segmentation with Single Frozen Convolutional CLIP
HIPIE
Code release for "Hierarchical Open-vocabulary Universal Image Segmentation"
InternLM
InternLM has open-sourced 7 and 20 billion parameter base models and chat models.
LLaVA
Visual Instruction Tuning: Large Language-and-Vision Assistant built towards multimodal GPT-4 level capabilities.
Mask2Former
Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"
OMG-Seg
OMG-LLaVA and OMG-Seg codebase
OmniScient-Model
This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model
OpenSeeD
A Simple Framework for Open-Vocabulary Segmentation and Detection
Osprey
The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
Semantic-SAM
Official implementation of the paper "Semantic-SAM: Segment and Recognize Anything at Any Granularity"
subobjects
Official repository of paper "Subobject-level Image Tokenization"
VAR
[GPT beats diffusionš„] [scaling laws in visual generationš] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction"
zhang-tao-whu.github.io
AcadHomepage: A Modern and Responsive Academic Personal Homepage