BriVL

The code repository for our Nature Communications paper "Towards artificial general intelligence via a multimodal foundation model".

BriVL (Bridging Vision and Language)

Prerequisites

Environment:
- python 3.8.3
- pathlib 2.3.5
- yaml 0.2.5
- easydict 1.9
- pillow 7.2.0
- numpy 1.18.5
- pytorch 1.7.1
- torchvision 0.8.2
- transformers 4.6.1 (installation instructions)
- timm 0.4.9
Pre-trained weights of BriVL w/ RoBERTa-base:
- Google Drive
- Baidu Net Disk (code: 6een)

Note that the pre-trained model should be put under the ./pretrained/ folder. We only release BriVL w/ RoBERTa-base when the paper is under review. And BriVL w/ RoBERTa-large pre-trained on the 650M data will be accessible after the paper is published.

Feature Extraction

Code for image/text feature extraction and cross-modal similarity computation is presented in example.py. Please prepare images.csv and texts.csv under the ./input_data/ folder before running the code, where each line of images.csv should be the image file name and each line of texts.csv should be a piece of Chinese text.

FangSen9000 / brivl-nmi

BriVL

BriVL (Bridging Vision and Language)

Prerequisites

Feature Extraction

About

Languages