mohammad-albarham / cluster_translation_13M_captions_dataset

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Overview

This reposirtory contains the code used for translating about 13 million of image-caption pairs.

Main dataset

The dataset has been downloaded from BLIP repository as:

CC3M+CC12M+SBU, Filtered synthetic caption by ViT-L, here.

After downloaded the dataset, I have made chunks, just for managing the data, but it is not necessary at all.

Setup the environment

Create python environment to keep your system clean :)

python3 -m venv .transltion_venv
source .transltion_venv/bin/activate

Install the requirements needed

pip3 install requirements.txt

Main code used for trainslation:

The main code used for translation exists on nllb_multi_gpus_inference file. The code initially adopted from here.

GPU used in this translation

  • I have used a cluster with 4 A10 GPUs, each A10 GPU has 24GB of RAM.

LICENSE

  • I have used MIT LICENSE for this code, but for the dataset used you should refer to BLIP LICENSE here

About

License:MIT License


Languages

Language:Python 100.0%