This repository provides an overview of all components used for the creation of BLOOMZ & mT0 and xP3 introduced in the paper Crosslingual Generalization through Multitask Finetuning.
Name | Explanation | Example models |
---|---|---|
xP3 | Mixture of 13 training tasks in 46 languages with English prompts | BLOOMZ & mT0-13B |
xP3mt | Mixture of 13 training tasks in 46 languages with prompts in 20 languages (machine-translated from English) | BLOOMZ-MT & mT0-13B-MT |
xP3all | xP3 + our evaluation datasets adding an additional 3 tasks for a total of 16 tasks in 46 languages with English prompts | |
xP3megds | Megatron-DeepSpeed processed version of xP3 | BLOOMZ |
P3 | Repreprocessed version of the English-only P3 with 8 training tasks | BLOOMZ-P3 & mT0-13B-P3 |
Multitask finetuned on xP3. Recommended for prompting in English. | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Parameters | 300M | 580M | 1.2B | 3.7B | 13B | 560M | 1.1B | 1.7B | 3B | 7.1B | 176B |
Finetuned Model | mt0-base | mt0-small | mt0-large | mt0-xl | mt0-xxl | bloomz-560m | bloomz-1b1 | bloomz-1b7 | bloomz-3b | bloomz-7b1 | bloomz |
Multitask finetuned on xP3mt. Recommended for prompting in non-English. | |||||||||||
Finetuned Model | mt0-xxl-mt | bloomz-7b1-mt | bloomz-mt | ||||||||
Multitask finetuned on P3. Released for research purposes only. Strictly inferior to above models! | |||||||||||
Finetuned Model | mt0-xxl-p3 | bloomz-7b1-p3 | bloomz-p3 | ||||||||
Original pretrained checkpoints. Not recommended. | |||||||||||
Pretrained Model | mt5-base | mt5-small | mt5-large | mt5-xl | mt5-xxl | bloom-560m | bloom-1b1 | bloom-1b7 | bloom-3b | bloom-7b1 | bloom |
We have processed & uploaded xP3. If you want to recreate it, follow these steps:
- Get promptsource: For xP3mt
git clone -b xp3mt https://github.com/Muennighoff/promptsource.git
, for xP3git clone -b tr13 https://github.com/Muennighoff/promptsource.git
& installcd promptsource; pip install -e .
- Get packages
pip install -q datasets iso-639
- Get the creation script & edit it if necessary:
- For xP3mt, set
USE_ENGLISH_PROMPTS = False
in the beginning - For xP3, set
USE_ENGLISH_PROMPTS = True
in the beginning
- Run the script, such as via
python prepare_xp3.py
or a SLURM script
- Download the pretrained model checkpoint, which is of shape PP=12, TP=4, DP=4. If you want to continue finetuning, you can also use our finetuned checkpoint, which is of shape PP=72, TP=1, DP=4.
- Setup the training code:
git clone -b t0loading https://github.com/bigscience-workshop/Megatron-DeepSpeed
& follow its setup guide. - Download the Megatron-DeepSpeed processed xP3megds or repreprocess it for Megatron-DeepSpeed yourself by downloading xP3, removing the
merged_{lang}.jsonl
files & using the script here. - Setup & run training script. We use SLURM scripts available at bigscience-workshop/bigscience/train/tr13-mtf and referred to as
xp3capmixnewcodelonglossseq
. E.g. this is the script launched to train bloomz. Important parts of the script to modify are:
#SBATCH
variables, such as nodes, gpus, time, etc. - Our SLURM guide is heresource $six_ALL_CCFRWORK/start-tr13f-6B3-ml-t0
to point to your own conda environment setup via Megatron-DeepSpeed- PATH environment variables, notably
TRAIN_DATA_PATH
&VALID_DATA_PATH
, which point to files pointing to your processed training and validation data. We provide our files in this repository (xp3capmixnewcodelong_train.txt
&xp3capmixnewcodelong_validation.txt
), but you will likely want to change the paths inside. The percentages per language are based on how much each language makes up in xP3 with code being slightly upsampled.
- PP_SIZE=72, TP_SIZE=1 & BATCH SIZE & co specifying the layout. This will depend on the hardware available to you. If you change, you may have to reshape the model. For reshaping you need to write new code or use a universal checkpoint (Still needs to be uploaded).
- If you want to restart from a saved checkpoint (e.g. after training a few steps), make sure to remove the the
--no-load-optim
&--reset-progress
flags
Helpful resources:
Follow the finetuning instructions here making sure to use pretrained mT5 models & the xP3 dataset.
Helpful resources:
Evaluation results are all available in this repository: https://huggingface.co/datasets/bigscience/evaluation-results under the respective models. Below we explain how to run evaluation.
We evaluate the models on Rank Evaluation on XCOPA, XNLI, XStoryCloze & XWinograd:
- Get promptsource fork:
git clone -b xp3mt https://github.com/Muennighoff/promptsource.git
&cd promptsource; pip install -e .
- Get t-zero fork:
git clone -b muennighoff/upgrdps https://github.com/Muennighoff/t-zero.git
&cd t-zero; pip install -e .
- Download model & run evaluation script, for example for bloomz.
We evaluate generation on translation & summarization during training for validation:
- Get promptsource fork:
git clone -b xp3mt https://github.com/Muennighoff/promptsource
&cd promptsource; pip install -e .
- Get bigscience-workshop/lm-evaluation-harness We use bigscience-workshop/lm-evaluation-harness for evaluating translation & summarization during training. The scripts are available here.
We also evaluate code generation on HumanEval:
- Get code evaluation dataset
git clone https://github.com/loubnabnl/bloom-code-evaluation
. - Set
prepend_eos
toFalse
incode_eval.py
atcomplete_code(model, tokenizer, prompt, num_completions=1, prepend_eos=True, **gen_kwargs)
i.e.complete_code(model, tokenizer, prompt, num_completions=1, prepend_eos=False, **gen_kwargs)
. - Download model & run evaluation script swapping out MODEL_CKPT for your path, for example for bloomz use this.
- Figure 1:
plotstables/xp3_languages.ipynb
& colab - Figure 2:
plotstables/xp3_taxonomy.drawio
&plotstables/xp3_taxonomy.pdf
- Figure 3:
plotstables/xp3_variants.pdf
& drawings - Figure 4:
plotstables/xp3_generalization_bar.pdf
& colab - Figure 5:
plotstables/lang_generalization
& colab - Figure 6:
plotstables/scale.pdf
& colab - Figure 7:
plotstables/validation.pdf
& colab - Figure 8:
plotstables/pretraining_sizes.pdf
& colab - Figure 9:
plotstables/english_task_generalization.pdf
& colab - Figure 10:
plotstables/task_generalization.pdf
& colab - Figure 11:
plotstables/roots_xp3_languages.pdf
& colab requiring some of the files inplotstables/contamination
- Figure 12:
plotstables/examples/bloom_code_example.py
&plotstables/examples/bloom_code_light.pdf
&plotstables/examples/bloomz_code_light.pdf
; The raw code files can be found here & here - Figure 13 - Figure 16:
plotstables/examples/*.pdf
&plotstables/examples/generations.drawio
- Table 1: Colab & Colab for complex version
- Table 2: Adapted from the Codex paper
- Table 3: Manual
- Table 4:
plotstables/compute_codegen_len.ipynb
for generations &plotstables/countcode.py
for xP3 - Table 5:
plotstables/levenshtein.py
- Table 6: Same as Table 1 with languages swapped from L1 to L2
- Table 7: Colab
- Prompt Appendix: https://github.com/albanie/prompt_formatting_in_latex
@misc{muennighoff2022crosslingual,
title={Crosslingual Generalization through Multitask Finetuning},
author={Niklas Muennighoff and Thomas Wang and Lintang Sutawika and Adam Roberts and Stella Biderman and Teven Le Scao and M Saiful Bari and Sheng Shen and Zheng-Xin Yong and Hailey Schoelkopf and Xiangru Tang and Dragomir Radev and Alham Fikri Aji and Khalid Almubarak and Samuel Albanie and Zaid Alyafeai and Albert Webson and Edward Raff and Colin Raffel},
year={2022},
eprint={2211.01786},
archivePrefix={arXiv},
primaryClass={cs.CL}
}