There are 2 repositories under vision-and-language-pre-training topic.
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
The Paper List of Cross-Modal Matching / Pretraining / Transfering for Preliminary Insight.
Recent Advances in Vision and Language Pre-training (VLP)
A curated list of vision-and-language pre-training (VLP). :-)
Companion Repo for the Vision Language Modelling YouTube series - https://bit.ly/3PsbsC2 - by Prithivi Da. Open to PRs and collaborations
Vision-Language Pre-Training for Boosting Scene Text Detectors (CVPR2022)
A list of research papers on knowledge-enhanced multimodal learning