FangSen9000 / brivl-nmi

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

BriVL

The code repository for our Nature Communications paper "Towards artificial general intelligence via a multimodal foundation model".

BriVL (Bridging Vision and Language)

architecture

Prerequisites

  • Environment:
    • python 3.8.3
    • pathlib 2.3.5
    • yaml 0.2.5
    • easydict 1.9
    • pillow 7.2.0
    • numpy 1.18.5
    • pytorch 1.7.1
    • torchvision 0.8.2
    • transformers 4.6.1 (installation instructions)
    • timm 0.4.9
  • Pre-trained weights of BriVL w/ RoBERTa-base:

Note that the pre-trained model should be put under the ./pretrained/ folder. We only release BriVL w/ RoBERTa-base when the paper is under review. And BriVL w/ RoBERTa-large pre-trained on the 650M data will be accessible after the paper is published.

Feature Extraction

Code for image/text feature extraction and cross-modal similarity computation is presented in example.py. Please prepare images.csv and texts.csv under the ./input_data/ folder before running the code, where each line of images.csv should be the image file name and each line of texts.csv should be a piece of Chinese text.

About

License:Other


Languages

Language:Python 100.0%