Finetuning BLOOM

Question

Finetuning BLOOM

AnaRhisT94 opened this issue 2 years ago · comments

Ilan Aizelman commented 2 years ago

Hi,

What's the process in finetuning BLOOM?
Did anyone succeed and willing to share the code?

Thanks!

Mayank Mishra · Answer 1 · Fri Sep 02 2022 09:54:04 GMT+0800 (China Standard Time)

Hi,
I am not sure but in the original megatron code, there was an argument (don't remember the name) that resets the optimizer, dataloader etc which.you could use to do finetuning. Not sure if that is present or works in this repo.

Kovvuri Satyanarayana Reddy · Answer 2 · Mon Jun 05 2023 14:53:02 GMT+0800 (China Standard Time)

Hey @mayank31398, Just wondering is the pretrain_gpt.py is used for pretraining BLOOM models? if yes then Architecture for gpt and Bloom are same? but i see different implementation for gpt and bloom in hugginface transformers.

Also i am trying to finetune StarCoder model using Megatron-DeepSpeed 3D parallelism, can you give some idea how it can be done?

Mayank Mishra · Answer 3 · Wed Jun 07 2023 20:19:14 GMT+0800 (China Standard Time)

This is the script used for launching 176B: https://github.com/bigscience-workshop/bigscience/blob/master/train/tr11-176B-ml/tr11-176B-ml.slurm
The architecture is not the same since BLOOM uses alibi and GPT uses absolute embeddings.

Mayank Mishra · Answer 4 · Wed Jun 07 2023 20:20:37 GMT+0800 (China Standard Time)

For Starcoder, 4D parallelism is used
Tensor Parallel, Pipeline Parallel, Sequence Parallel, Data Parallel
This is the repo used for starcoder and santacoder training: https://github.com/bigcode-project/Megatron-LM

Kovvuri Satyanarayana Reddy · Answer 5 · Thu Jun 08 2023 11:38:47 GMT+0800 (China Standard Time)

Thank You.