kyegomez / ViTAR

Implementation of ViTaR: ViTAR: Vision Transformer with Any Resolution in PyTorch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multi-Modality

Vitar Implementation

Implementation of the paper: "ViTAR: Vision Transformer with Any Resolution" PAPER LINK

Install

$ pip3 install -U vitar

Example

import torch
from vitar.main import Vitar

# Create a random input tensor
x = torch.randn(1, 3, 224, 224)

# Initialize the Vitar model with specified parameters
model = Vitar(
    512,
    8,
    depth=12,
    patch_size=16,
    image_size=224,
    channels=3,
    ffn_dim=2048,
    num_classes=1000,
)

# Pass the input tensor through the model
out = model(x)

# Print the output tensor
print(out)

Todo

  • Implement a training script to train this model on coco and report results
  • Massive pre-training script on multiple datasets to test out the architecture

About

Implementation of ViTaR: ViTAR: Vision Transformer with Any Resolution in PyTorch

License:MIT License


Languages

Language:Python 82.8%Language:Shell 17.2%