vision-transformer

There are 21 repositories under vision-transformer topic.

open-mmlab / mmdetection
OpenMMLab Detection Toolbox and Benchmark
cascade-rcnn convnext detr fast-rcnn faster-rcnn glip grounding-dino instance-segmentation mask-rcnn object-detection panoptic-segmentation pytorch retinanet rtmdet semisupervised-learning ssd swin-transformer transformer vision-transformer yolo
Language:Python 28662
LaTeX-OCR
lukas-blecher / LaTeX-OCR
pix2tex: Using a ViT to convert images of equations into LaTeX code.
dataset deep-learning im2latex im2markup im2text image-processing image2text latex latex-ocr machine-learning math-ocr ocr python pytorch transformer vision-transformer vit
Language:Python 11452
NielsRogge / Transformers-Tutorials
This repository contains demos I made with the Transformers library by HuggingFace.
bert gpt-2 layoutlm pytorch transformers vision-transformer
Language:Jupyter Notebook 8623
cmhungsteve / Awesome-Transformer-Attention
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
attention-mechanism attention-mechanisms awesome-list computer-vision deep-learning detr papers self-attention transformer transformer-architecture transformer-awesome transformer-cv transformer-models transformer-with-cv transformers vision-transformer visual-transformer vit
4456
adithya-s-k / omniparse
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
ingestion-api ocr omniparser parse-server parser-library vision-transformer web-crawler whisper-api
Language:Python 4264
JingyunLiang / SwinIR
SwinIR: Image Restoration Using Swin Transformer (official repository)
image-super-resolution image-denoising compression-artifact-reduction image-deblocking transformer real-world-image-super-resolution lightweight-image-super-resolution image-restoration low-level-vision vision-transformer image-sr restoration super-resolution denoising deblocking decompression
Language:Python 4255
huawei-noah / Efficient-AI-Backbones
Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.
convolutional-neural-networks efficient-inference ghostnet imagenet model-compression pretrained-models pytorch tensorflow transformer vision-transformer
Language:Python 3930
FoundationVision / VAR
[GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-simple, user-friendly yet state-of-the-art* codebase for autoregressive image generation!
auto-regressive-model autoregressive-models diffusion-models generative-ai generative-model gpt gpt-2 image-generation large-language-models transformers vision-transformer
Language:Python 3864
open-mmlab / mmpretrain
OpenMMLab Pre-training Toolbox and Benchmark
beit clip constrastive-learning convnext deep-learning image-classification mae masked-image-modeling mobilenet moco multimodal pretrained-models pytorch resnet self-supervised-learning swin-transformer vision-transformer
Language:Python 3302
google-research / scenic
Scenic: A Jax Library for Computer Vision Research and Beyond
jax computer-vision deep-learning research attention transformers vision-transformer
Language:Python 3157
towhee-io / towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
machine-learning convolutional-networks embedding-vectors embeddings computer-vision image-processing video-processing feature-extraction image-retrieval unstructured-data feature-vector transformer milvus towhee vision-transformer vit pipeline llm
Language:Python 3090
InternLM / InternLM-XComposer
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
chatgpt foundation gpt gpt-4 instruction-tuning language-model large-language-model large-vision-language-model llm mllm multi-modality multimodal supervised-finetuning vision-language-model vision-transformer visual-language-learning
Language:Python 2268
baaivision / EVA
EVA Series: Visual Representation Fantasies from BAAI
foundation-models representation-learning vision-transformer
Language:Python 2130
alibaba / EasyCV
An all-in-one toolkit for computer vision
classification computer-vision object-detection pytorch self-supervised-learning transformers vision-transformer
Language:Python 1728
hila-chefer / Transformer-Explainability
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
attention-matrix attention-visualization bert bert-model cvpr2021 deep-learning explainability perturbation transformer-interpretability vision-transformer visualize-classifications vit
Language:Jupyter Notebook 1724
mit-han-lab / efficientvit
EfficientViT is a new family of vision models for efficient high-resolution vision.
efficientvit high-resolution imagenet segment-anything segmentation vision-transformer
Language:Python 1652
microsoft / Cream
This is a collection of our NAS and Vision Transformer work.
nas automl vision-transformer rpe vit-compression efficiency knowledge-distillation
Language:Python 1623
JingyunLiang / VRT
VRT: A Video Restoration Transformer (official repository)
deblurring denoising low-level-vision restoration sr super-resolution transformer video video-deblurring video-denoising video-restoration video-sr video-super-resolution vision-transformer
Language:Python 1309
ViTAE-Transformer / ViTPose
The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [TPAMI'23] "ViTPose++: Vision Transformer for Generic Body Pose Estimation"
deep-learning distillation mae pose-estimation pytorch self-supervised-learning vision-transformer
Language:Python 1274
MCG-NJU / VideoMAE
[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
action-recognition mae masked-autoencoder neurips-2022 pytorch self-supervised-learning transformer video-analysis video-representation-learning video-transformer video-understanding vision-transformer
Language:Python 1273
czczup / ViT-Adapter
[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
adapter object-detection semantic-segmentation vision-transformer
Language:Python 1187
OpenGVLab / InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
foundation-models video-understanding vision-transformer action-recognition masked-autoencoder multimodal open-set-recognition spatio-temporal-action-localization temporal-action-localization video-question-answering video-retrieval zero-shot-classification zero-shot-retrieval benchmark contrastive-learning self-supervised instruction-tuning video-data video-dataset video-clip
Language:Python 1143
yitu-opensource / T2T-ViT
ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
vision-transformer t2t-transformer vit
Language:Jupyter Notebook 1132
pprp / awesome-attention-mechanism-in-cv
Awesome List of Attention Modules and Plug&Play Modules in Computer Vision
pytorch-attention attention-model attention-mechanisms implementation vision-transformer plugandplay computer-vision
Language:Python 1010
NVlabs / VoxFormer
Official PyTorch implementation of VoxFormer [CVPR 2023 Highlight]
3d-scene-understanding artificial-intelligence autonomous-driving autonomous-vehicles computer-vision semantic-scene-completion vision-transformer 3d-perception occupancy-grid-map machine-learning voxel-proceessing 2d-to-3d deep-learning semantickitti
Language:Python 1001
OFA-Sys / ONE-PEACE
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
foundation-models multimodal representation-learning vision-language audio-language vision-and-language vision-transformer contrastive-loss
Language:Python 898
emcf / thepipe
Extract markdown and images from URLs, PDFs, docs, slides, and more, ready for multimodal LLMs. ⚡
multimodal pdf vision-transformer large-language-models web gpt-4 scrapers gpt-4o
Language:Python 849
hustvl / YOLOS
[NeurIPS 2021] You Only Look at One Sequence
computer-vision object-detection transformer vision-transformer
Language:Jupyter Notebook 818
xxxnell / how-do-vits-work
(ICLR 2022 Spotlight) Official PyTorch implementation of "How Do Vision Transformers Work?"
loss-landscape pytorch self-attention transformer vision-transformer
Language:Python 800
sithu31296 / semantic-segmentation
SOTA Semantic Segmentation Models in PyTorch
ade20k camvid cityscapes coco-stuff dataset deep-learning pascal-context pytorch semantic-segmentation transformer vision-transformer
Language:Python 792
jacobgil / vit-explain
Explainability for Vision Transformers
deep-learning explainable-ai pytorch transformer vision-transformer
Language:Python 786
NVlabs / FasterViT
[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention
ade20k backbone deep-learning image-net pre-trained-model self-attention vision-transformer visual-recognition coco object-detection semantic-segmentation foundation-models image-classification
Language:Python 745
LeapLabTHU / DAT
Repository of Vision Transformer with Deformable Attention (CVPR2022) and DAT++: Spatially Dynamic Vision Transformerwith Deformable Attention
deep-learning deformable-attention image-classification pytorch vision-transformer
Language:Python 740
Alibaba-MIIL / ImageNet21K
Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper
downstream imagenet21k mixer multi-label-classification pretraining semantic-softmax single-label vision-transformer
Language:Python 711
uncbiag / Awesome-Foundation-Models
A curated list of foundation models for vision and language tasks
foundation-models vision-transformer large-language-models transformer-models multimodal-models
694
4DVLab / Vision-Centric-BEV-Perception
Vision-Centric BEV Perception: A Survey
bev-perception bird-eye-view deep-learning transformer vision-transformer
653