SHI Labs

SHI Labs's repositories

OneFormer

OneFormer: One Transformer to Rule Universal Image Segmentation, arxiv 2022 / CVPR 2023

Language:Jupyter NotebookMIT1408 20 108

Versatile-Diffusion

Versatile Diffusion: Text, Images and Variations All in One Diffusion Model, arXiv 2022 / ICCV 2023

Language:PythonMIT1301 28 34

Neighborhood-Attention-Transformer

Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022

Language:PythonMIT1025 16 75

Prompt-Free-Diffusion

Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models, arxiv 2023 / CVPR 2024

Language:PythonMIT716 12 25

Matting-Anything

Matting Anything Model (MAM), an efficient and versatile framework for estimating the alpha matte of any instance in an image with flexible and interactive visual or linguistic user prompt guidance.

Language:PythonMIT570 14 21

Compact-Transformers

Escaping the Big Data Paradigm with Compact Transformers, 2021 (Train your Vision Transformers in 30 mins on CIFAR-10 with a single GPU!)

Language:PythonApache-2.0487 15 64

NATTEN

Neighborhood Attention Extension. Bringing attention to a neighborhood near you!

Language:CudaNOASSERTION323 10 97

Smooth-Diffusion

Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models arXiv 2023 / CVPR 2024

Language:PythonMIT281 21 12

VCoder

VCoder: Versatile Vision Encoders for Multimodal Large Language Models, arXiv 2023 / CVPR 2024

Language:PythonApache-2.0250 9 6

Rethinking-Text-Segmentation

[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Language:Python241 17 36

[CVPR 2020 & 2021 & 2022 & 2023] Agriculture-Vision Dataset, Prize Challenge and Workshop: A joint effort with many great collaborators to bring Agriculture and Computer Vision/AI communities together to benefit humanity!

194 18 3

FcF-Inpainting

[WACV 2023] Keys to Better Image Inpainting: Structure and Texture Go Hand in Hand

Language:Jupyter NotebookNOASSERTION169 11 38

Convolutional-MLPs

[Preprint] ConvMLP: Hierarchical Convolutional MLPs for Vision, 2021

Language:PythonApache-2.0162 4 7

CuMo

CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts

Language:PythonApache-2.0119 1 11

Forget-Me-Not

Forget-Me-Not: Learning to Forget in Text-to-Image Diffusion Models, 2023

Language:PythonMIT105 8 8

VMFormer

[Preprint] VMFormer: End-to-End Video Matting with Transformer

Language:PythonNOASSERTION105 8 21

StyleNAT

New flexible and efficient image generation framework that sets new SOTA on FFHQ-256 with FID 2.05, 2022

Language:PythonMIT97 6 7

Unsupervised-Domain-Adaptation-with-Differential-Treatment

[CVPR 2020] Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation

Language:Python87 9 14

Text2Video-Zero-sd-webui

Language:PythonNOASSERTION79 3 9

SH-GAN

[WACV 2023] Image Completion with Heterogeneously Filtered Spectral Hints

Language:Python62 5 4

VIM

Language:PythonMIT52 5 4

CompactNet

Language:Jupyter Notebook30 2 1

DiSparse-Multitask-Model-Compression

[CVPR 2022] DiSparse: Disentangled Sparsification for Multitask Model Compression

Language:Jupyter Notebook13 2 4

OneFormer-Colab

[Colab Demo Code] OneFormer: One Transformer to Rule Universal Image Segmentation.

Language:PythonMIT13 2 1

Diffusion-Driven-Test-Time-Adaptation-via-Synthetic-Domain-Alignment

Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment

Language:Python1200

Boosted-Dynamic-Networks

Boosted Dynamic Neural Networks, AAAI 2023

Language:PythonMIT8 20

PAIR-Diffusion

PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models, 2023

Language:PythonMIT3 10

LIVE-Layerwise-Image-Vectorization

[CVPR 2022 Oral] Towards Layer-wise Image Vectorization

Language:PythonApache-2.02 10

Text2Video-Zero

a copy of "Text-to-Image Diffusion Models are Zero-Shot Video Generators", ICCV 2023

Language:PythonNOASSERTION2 10

VideoINR-Continuous-Space-Time-Super-Resolution

[CVPR 2022] VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution

Language:Python010