Zhe Gan (zhegan27)

zhegan27

Geek Repo

Company:Microsoft

Github PK Tool:Github PK Tool

Zhe Gan's starred repositories

X-Decoder

[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language

Language:PythonLicense:Apache-2.0Stargazers:1266Issues:0Issues:0

GRiT

GRiT: A Generative Region-to-text Transformer for Object Understanding (https://arxiv.org/abs/2212.00280)

Language:PythonLicense:MITStargazers:281Issues:0Issues:0
Language:Jupyter NotebookLicense:MITStargazers:80Issues:0Issues:0

FIBER

Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

Language:PythonLicense:MITStargazers:124Issues:0Issues:0
Language:PythonStargazers:75Issues:0Issues:0

mindall-e

PyTorch implementation of a 1.3B text-to-image generation model trained on 14 million image-text pairs

Language:PythonLicense:NOASSERTIONStargazers:630Issues:0Issues:0

GenerativeImage2Text

GIT: A Generative Image-to-text Transformer for Vision and Language

Language:PythonLicense:MITStargazers:528Issues:0Issues:0

UniTAB

UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)

Language:PythonLicense:MITStargazers:82Issues:0Issues:0

BEVT

PyTorch implementation of BEVT (CVPR 2022) https://arxiv.org/abs/2112.01529

Language:PythonLicense:Apache-2.0Stargazers:151Issues:0Issues:0

ViT-Adapter

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions

Language:PythonLicense:Apache-2.0Stargazers:1155Issues:0Issues:0

SwinBERT

Research code for CVPR 2022 paper "SwinBERT: End-to-End Transformers with Sparse Attention for Video Captioning"

Language:PythonLicense:MITStargazers:232Issues:0Issues:0

SLIP

Code release for SLIP Self-supervision meets Language-Image Pre-training

Language:PythonLicense:MITStargazers:731Issues:0Issues:0

DallEval

DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models (ICCV 2023)

Language:Jupyter NotebookLicense:MITStargazers:135Issues:0Issues:0

OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Language:PythonLicense:Apache-2.0Stargazers:2357Issues:0Issues:0

Detic

Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".

Language:PythonLicense:Apache-2.0Stargazers:1801Issues:0Issues:0

BLIP

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Language:Jupyter NotebookLicense:BSD-3-ClauseStargazers:4383Issues:0Issues:0

pytorch_violet

A PyTorch implementation of VIOLET

Language:PythonStargazers:136Issues:0Issues:0

ibot

iBOT :robot:: Image BERT Pre-Training with Online Tokenizer (ICLR 2022)

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:632Issues:0Issues:0

GLIP

Grounded Language-Image Pre-training

Language:PythonLicense:MITStargazers:2025Issues:0Issues:0

mlp-vil

MLPs for Vision and Langauge Modeling (Coming Soon)

License:MITStargazers:27Issues:0Issues:0

METER

METER: A Multimodal End-to-end TransformER Framework

Language:PythonLicense:MITStargazers:355Issues:0Issues:0

Stable-Pix2Seq

A full-fledged version of Pix2Seq

Language:PythonLicense:Apache-2.0Stargazers:234Issues:0Issues:0

CV_A-FAN

[TMLR] "Adversarial Feature Augmentation and Normalization for Visual Recognition", Tianlong Chen, Yu Cheng, Zhe Gan, Jianfeng Wang, Lijuan Wang, Zhangyang Wang, Jingjing Liu

Language:PythonLicense:MITStargazers:20Issues:0Issues:0

VL-T5

PyTorch code for "Unifying Vision-and-Language Tasks via Text Generation" (ICML 2021)

Language:PythonLicense:MITStargazers:354Issues:0Issues:0

VidLanKD

Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer (NeurIPS 2021))

Language:PythonStargazers:56Issues:0Issues:0

P-tuning

A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.

Language:PythonLicense:MITStargazers:901Issues:0Issues:0

ALBEF

Code for ALBEF: a new vision-language pre-training method

Language:PythonLicense:BSD-3-ClauseStargazers:1427Issues:0Issues:0

CoOp

Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)

Language:PythonLicense:MITStargazers:1538Issues:0Issues:0
Language:PythonStargazers:124Issues:0Issues:0

Focal-Transformer

[NeurIPS 2021 Spotlight] Official code for "Focal Self-attention for Local-Global Interactions in Vision Transformers"

Language:PythonLicense:MITStargazers:543Issues:0Issues:0