facebookresearch / deit

Official DeiT repository

facebookresearch/deit Issues

ViT-B Training for DeiT
Closed 3 months ago2
Will you be releasing the accuracy of the official deit III framework trained tiny version on IN1k?
Updated 4 months ago
DeiT depth 24 (CaiT - TABLE 1)
Closed 4 months ago2
Gradient accumulation code
Updated 4 months ago
Question about different seeds per gpu with DDP
Updated 4 months ago
Training
Updated 5 months ago
Inclusion of Transformers Need Registers
Updated 7 months ago
Slow Training
Closed 8 months ago2
random.seed(seed) in line 205 is commented
Updated 8 months ago
Checkpoints of IN21K pretrained deit III
Updated 10 months ago
Hi，Why can't I find deit_tiny_distilled_patch16_224 in hubconf
Updated 10 months ago
TracerWarning
Updated 10 months ago
batch_size flag
Updated 10 months ago2
how to implement document layout analysis use Deit-B
Closed a year ago2
How to launch a training of CAIT models ?
Updated a year ago
Code for cosub
Closed a year ago
The ablation experiment of DeiT
Updated a year ago2
ImageNet21K data preparation for pre-training
Updated a year ago5
Meaning of the model name ( ResMLP)
Closed a year ago1
Can I use timm==0.4.12 instead of timm==0.3.2 ?
Closed a year ago1
unexpected keyword argument 'pretrained_cfg'
Closed a year ago2
Are the hyperparameters for DeiT-T and for DeiT-S any different than DeiT-B?
Closed a year ago1
ImageNet21k pretrained model without finetuning on 1k
Closed a year ago2
How long is it supposed to take to train on ImageNet21k for 90 epochs with 8 V100 GPUs
Closed a year ago1
number of classes
Closed a year ago1
What's the accuracy of deit-S without pre-trained on CIFAR10
Closed a year ago1
how to implement cosub training use deit-III
Updated a year ago2
What are the hyperparameters for DeiT-III (epoch 400 or 600)?
Closed a year ago
how to implement cosub training use deit-III
Closed a year ago
Single machine multi-GPU training
Updated a year ago
Multi-node support
Closed a year ago
Multinode Slurm Training
Closed a year ago
What batch size number other than 1024 have been tried when training a DeiT model?
Updated a year ago
Does the EMA is used in DeiT-III?
Closed a year ago3
Question about Throughput
Closed 2 years ago1
cifar100 pretrain model?
Closed 2 years ago1
The concatenation of 'cls_tokens' and 'patch_embedding' is not necceassay.
Closed 2 years ago1
What is the difference between class attention in the paper CaiT and traditional multi-headed self-attention？
Closed 2 years ago1
What is the ImageNet-1K Top-1 accuracy of Training from 0 to 400 epochs (Fig. 5 of Deit III paper)
Updated 2 years ago
Config file of ViT-B/16
Updated 2 years ago2
Object detection from the DeiT III pretrained model
Closed 2 years ago2
Uneven memory usage among GPUs with DistributedDataParallel
Closed 2 years ago
Is it possible if I can see how the validation accuracy changes over the number of epochs for DeiT?
Closed 2 years ago
Is "unscale-lr" used in DeiT training on ImageNet1k
Closed 2 years ago
Reproduce PatchConvnet
Closed 2 years ago2
LAMB and amp
Closed 2 years ago1
Question about training DeiT-small distilled
Closed 2 years ago2
Is uniform drop-path rates beneficial ?
Closed 2 years ago2
DeiT-tiny pth size is not 5M, it is 22M
Closed 2 years ago1
Confusion about fine-tuning
Closed 2 years ago4