There are 8 repositories under vision-transformer topic.
CV backbones including GhostNet, TinyNet and TNT, developed by Huawei Noah's Ark Lab.
SwinIR: Image Restoration Using Swin Transformer (official repository)
This repository contains demos I made with the Transformers library by HuggingFace.
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Scenic: A Jax Library for Computer Vision Research and Beyond
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
(ICLR 2022 Spotlight) Official PyTorch implementation of "How Do Vision Transformers Work?"
VRT: A Video Restoration Transformer (official repository)
A framework that provides a simple API for developing ML-driven data processing and search pipelines.
Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper
:punch: CV中常用注意力模块;即插即用模块;ViT模型. PyTorch Implementation Collection of Attention Module and Plug&Play Module
This is an official implementation for "Contextual Transformer Networks for Visual Recognition".
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
Explainability for Vision Transformers
A collection of papers about Transformer in the field of medical image analysis.
SOTA Semantic Segmentation Models in PyTorch
Pytorch version of Vision Transformer (ViT) with pretrained models. This is part of CASL (https://casl-project.github.io/) and ASYML project.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
A PyTorch implementation of "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".
Awesome Transformers (self-attention) in Computer Vision
[NeurIPS 2021] Global Filter Networks for Image Classification
The official code for the paper: https://arxiv.org/pdf/2108.00154.pdf
MPViT:Multi-Path Vision Transformer for Dense Prediction in CVPR 2022
MIMDet: Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
TensorFlow port of PyTorch Image Models (timm) - image models with pretrained weights.
Tensorflow implementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)
a collection of visualization function
Official Pytorch Implementation for "Splicing ViT Features for Semantic Appearance Transfer" presenting "Splice" (CVPR 2022 Oral)
Official PyTorch Implementation of SAM-DETR (CVPR 2022)
Vision Transformer for 3D medical image registration (Pytorch).
PASSL包含 SimCLR,MoCo v1/v2,BYOL,CLIP,PixPro,BEiT,MAE等图像自监督算法以及 Vision Transformer,DEiT,Swin Transformer,CvT,T2T-ViT,MLP-Mixer,XCiT,ConvNeXt,PVTv2 等基础视觉算法