There are 2 repositories under vit topic.
pix2tex: Using a ViT to convert images of equations into LaTeX code.
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
A paper list of some recent Transformer-based CV works.
A PyTorch implementation of "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer"
Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, ResNet features.
SimpleAICV:pytorch training and testing examples.
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 30+ HF models, 15+ benchmarks
FFCS course registration made hassle free for VITians. Search courses and visualize the timetable on the go!
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
PASSL包含 SimCLR,MoCo v1/v2,BYOL,CLIP,PixPro,simsiam, SwAV, BEiT,MAE 等图像自监督算法以及 Vision Transformer,DEiT,Swin Transformer,CvT,T2T-ViT,MLP-Mixer,XCiT,ConvNeXt,PVTv2 等基础视觉算法
Official Code of Paper "Reversible Column Networks" "RevColv2"
i. A practical application of Transformer (ViT) on 2-D physiological signal (EEG) classification tasks. Also could be tried with EMG, EOG, ECG, etc. ii. Including the attention of spatial dimension (channel attention) and *temporal dimension*. iii. Common spatial pattern (CSP), an efficient feature enhancement method, realized with Python.
HugsVision is a easy to use huggingface wrapper for state-of-the-art computer vision
reproduction of semantic segmentation using masked autoencoder (mae)
Paddle Large Scale Classification Tools,supports ArcFace, CosFace, PartialFC, Data Parallel + Model Parallel. Model includes ResNet, ViT, Swin, DeiT, CaiT, FaceViT, MoCo, MAE, ConvMAE, CAE.
Mimix: A Text Generation Tool and Pretrained Chinese Models
[MedIA Journal] An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
A ViT based transformer applied on multi-channel time-series EEG data for motor imagery classification
Vision Transformer using TensorFlow 2.0
An unofficial implementation of ViTPose [Y. Xu et al., 2022]
This code repository contains the code used for my "Optimizing Memory Usage for Training LLMs and Vision Transformers in PyTorch" blog post.
An unofficial implementation of TubeViT in "Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning"
Training ImageNet / CIFAR models with sota strategies and fancy techniques such as ViT, KD, Rep, etc.