OedoSoldier

OedoSoldier's starred repositories

minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Language:PythonMIT18818 255 70

vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Language:PythonMIT17973 141 251

Swin-Transformer

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".

Language:PythonMIT12945 126 298

PyTorch-VAE

A Collection of Variational Autoencoders (VAE) in PyTorch.

Language:PythonApache-2.05992 44 79

Segment-Everything-Everywhere-All-At-Once

[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

Language:PythonApache-2.04036 56 132

SUPIR

SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild

Language:PythonNOASSERTION3380 66 102

LayerDiffuse

Transparent Image Layer Diffusion using Latent Transparency

Apache-2.01764 111 24

VLM_survey

Collection of AWESOME vision-language models for vision tasks

1739 108 6

Personalize-SAM

Personalize Segment Anything Model (SAM) with 1 shot in 10 seconds

Language:PythonMIT1420 27 44

prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Language:PythonNOASSERTION1287 15 19

X-Decoder

[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language

Language:PythonApache-2.01246 34 64

UniRepLKNet

[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

Language:PythonApache-2.0807 12 15

DAT

Repository of Vision Transformer with Deformable Attention (CVPR2022) and DAT++: Spatially Dynamic Vision Transformerwith Deformable Attention

Language:PythonApache-2.0693 13 34

Matting-Anything

Matting Anything Model (MAM), an efficient and versatile framework for estimating the alpha matte of any instance in an image with flexible and interactive visual or linguistic user prompt guidance.

Language:PythonMIT535 13 21

specter

SPECTER: Document-level Representation Learning using Citation-informed Transformers

Language:PythonApache-2.0493 19 40

APE

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception

Language:PythonApache-2.0416 7 34

DCNv4

[CVPR 2024] Deformable Convolution v4

Language:PythonMIT323 3 41

A-ViT

Official PyTorch implementation of A-ViT: Adaptive Tokens for Efficient Vision Transformer (CVPR 2022)

Language:PythonApache-2.0131 4 14

NaViT

My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

Language:PythonMIT121 6 2

Adversarial-Contrastive-Learning

[NeurIPS 2020] “ Robust Pre-Training by Adversarial Contrastive Learning”, Ziyu Jiang, Tianlong Chen, Ting Chen, Zhangyang Wang

Language:Python108 4 8

Modality-Gap

Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning

Language:Jupyter NotebookMIT96 5 3

SAMFeat

The official implementation of “Segment Anything Model is a Good Teacher for Local Feature Learning”.

Language:PythonMIT95 6 3

[NeurIPS 2022] “M³ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design”, Hanxue Liang*, Zhiwen Fan*, Rishov Sarkar, Ziyu Jiang, Tianlong Chen, Kai Zou, Yu Cheng, Cong Hao, Zhangyang Wang

Language:PythonMIT70 10 4

SimViT

[ICME 2022] code for the paper, SimVit: Exploring a simple vision transformer with sliding windows.

Language:Python65 1 2

DeLVM

Language:Python6200

ARGF_multimodal_fusion

codes for: Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion

Language:Python42 2 5

OADis

Official code for "Disentangling Visual Embeddings for Attributes and Objects" Published at CVPR 2022

Language:Python30 6 6

MixViT

[Pattern Recognition] Mix-ViT: Mixing Attentive Vision Transformer for Ultra-Fine-Grained Visual Categorization.

Language:Python16 3 2