OedoSoldier

OedoSoldier

Geek Repo

Github PK Tool:Github PK Tool

OedoSoldier's starred repositories

minGPT

A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training

Language:PythonLicense:MITStargazers:18818Issues:255Issues:70

vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Language:PythonLicense:MITStargazers:17973Issues:141Issues:251

Swin-Transformer

This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".

Language:PythonLicense:MITStargazers:12945Issues:126Issues:298

PyTorch-VAE

A Collection of Variational Autoencoders (VAE) in PyTorch.

Language:PythonLicense:Apache-2.0Stargazers:5992Issues:44Issues:79

Segment-Everything-Everywhere-All-At-Once

[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"

Language:PythonLicense:Apache-2.0Stargazers:4036Issues:56Issues:132

SUPIR

SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild

Language:PythonLicense:NOASSERTIONStargazers:3380Issues:66Issues:102

LayerDiffuse

Transparent Image Layer Diffusion using Latent Transparency

VLM_survey

Collection of AWESOME vision-language models for vision tasks

Personalize-SAM

Personalize Segment Anything Model (SAM) with 1 shot in 10 seconds

Language:PythonLicense:MITStargazers:1420Issues:27Issues:44

prismer

The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".

Language:PythonLicense:NOASSERTIONStargazers:1287Issues:15Issues:19

X-Decoder

[CVPR 2023] Official Implementation of X-Decoder for generalized decoding for pixel, image and language

Language:PythonLicense:Apache-2.0Stargazers:1246Issues:34Issues:64

UniRepLKNet

[CVPR'24] UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition

Language:PythonLicense:Apache-2.0Stargazers:807Issues:12Issues:15

DAT

Repository of Vision Transformer with Deformable Attention (CVPR2022) and DAT++: Spatially Dynamic Vision Transformerwith Deformable Attention

Language:PythonLicense:Apache-2.0Stargazers:693Issues:13Issues:34

Matting-Anything

Matting Anything Model (MAM), an efficient and versatile framework for estimating the alpha matte of any instance in an image with flexible and interactive visual or linguistic user prompt guidance.

Language:PythonLicense:MITStargazers:535Issues:13Issues:21

specter

SPECTER: Document-level Representation Learning using Citation-informed Transformers

Language:PythonLicense:Apache-2.0Stargazers:493Issues:19Issues:40

APE

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception

Language:PythonLicense:Apache-2.0Stargazers:416Issues:7Issues:34

DCNv4

[CVPR 2024] Deformable Convolution v4

Language:PythonLicense:MITStargazers:323Issues:3Issues:41
Language:PythonLicense:Apache-2.0Stargazers:233Issues:15Issues:22

A-ViT

Official PyTorch implementation of A-ViT: Adaptive Tokens for Efficient Vision Transformer (CVPR 2022)

Language:PythonLicense:Apache-2.0Stargazers:131Issues:4Issues:14

NaViT

My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

Language:PythonLicense:MITStargazers:121Issues:6Issues:2

Adversarial-Contrastive-Learning

[NeurIPS 2020] “ Robust Pre-Training by Adversarial Contrastive Learning”, Ziyu Jiang, Tianlong Chen, Ting Chen, Zhangyang Wang

Modality-Gap

Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning

Language:Jupyter NotebookLicense:MITStargazers:96Issues:5Issues:3

SAMFeat

The official implementation of “Segment Anything Model is a Good Teacher for Local Feature Learning”.

Language:PythonLicense:MITStargazers:95Issues:6Issues:3

M3ViT

[NeurIPS 2022] “M³ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design”, Hanxue Liang*, Zhiwen Fan*, Rishov Sarkar, Ziyu Jiang, Tianlong Chen, Kai Zou, Yu Cheng, Cong Hao, Zhangyang Wang

Language:PythonLicense:MITStargazers:70Issues:10Issues:4

SimViT

[ICME 2022] code for the paper, SimVit: Exploring a simple vision transformer with sliding windows.

Language:PythonStargazers:62Issues:0Issues:0

ARGF_multimodal_fusion

codes for: Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion

OADis

Official code for "Disentangling Visual Embeddings for Attributes and Objects" Published at CVPR 2022

MixViT

[Pattern Recognition] Mix-ViT: Mixing Attentive Vision Transformer for Ultra-Fine-Grained Visual Categorization.