YuchenLiu98 / COMM

Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models

How much data in the first pretrain stage？

shipengai opened this issue 8 months ago · comments

shipeng commented 8 months ago

about 100M？