for CLIP models:
- Download checkpoints from Google Drive.
- The running scripts are placed at
scripts
directory, or you can runbash clip_pareto_moe.sh <method> <version>
. The method can bemoe_ls
, andmoe_epo
. For detailed hyperparameter configuration of different verisons, please refer to the bash script.for LS, EPO, and MGDA, see
clip_mtl.sh
- the results would be saved at
results
directory.
for GPT-2 models:
- Download chekpoints form HuggingFace, fine-tuned checkpoints would be available after double-blind review.
the fine-tuning scripts are also provided, you can fine-tune the model on specific task. such as
python scripts/gpt2_finetune.py --dataset qqp
. - The running scripts are placed at
scripts
directory, or you can runbash gpt2_pareto_moe.sh <method> <version>
. The method can bemoe_ls
, andmoe_epo
. For detailed hyperparameter configuration of different verisons, please refer to the bash script.for other baselines, see
gpt2_merge.sh
- The results would be saved at
results
directory.
For DDP training, pass --num_devices <num_devices>
as arguments.
for example, bash gpt2_pareto_moe.sh moe_ls 6 --num_devices 4
.
For GPT-2 model with DDP training, you need to modify the file site-packages/transformers/models/gpt2/modeling_gpt2.py
attn_weights = torch.where(causal_mask, attn_weights.to(attn_weights.dtype), mask_value)
# clone the tensor to avoid in-place operation error
attn_weights = torch.where(causal_mask.clone(), attn_weights.to(attn_weights.dtype), mask_value)