dandelin / ViLT

Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pre-training Time

haoshuai714 opened this issue · comments

Thanks for your great codes!
In your paper, running the pre-training experiments needs 64 V100 GPUs.
How long have you been training with 64 V100 GPUs?
Thank you!

Hi @haoshuai714

https://tensorboard.dev/experiment/mNHxDM08R6eHKeU0JHn5vg/#scalars

This is the log of MLM+ITM pre-training with 64 V100 GPUS.