bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Finetuning BLOOM

AnaRhisT94 opened this issue · comments

Hi,

What's the process in finetuning BLOOM?
Did anyone succeed and willing to share the code?

Thanks!

Hi,
I am not sure but in the original megatron code, there was an argument (don't remember the name) that resets the optimizer, dataloader etc which.you could use to do finetuning. Not sure if that is present or works in this repo.

Hey @mayank31398, Just wondering is the pretrain_gpt.py is used for pretraining BLOOM models? if yes then Architecture for gpt and Bloom are same? but i see different implementation for gpt and bloom in hugginface transformers.

Also i am trying to finetune StarCoder model using Megatron-DeepSpeed 3D parallelism, can you give some idea how it can be done?

This is the script used for launching 176B: https://github.com/bigscience-workshop/bigscience/blob/master/train/tr11-176B-ml/tr11-176B-ml.slurm
The architecture is not the same since BLOOM uses alibi and GPT uses absolute embeddings.

For Starcoder, 4D parallelism is used
Tensor Parallel, Pipeline Parallel, Sequence Parallel, Data Parallel
This is the repo used for starcoder and santacoder training: https://github.com/bigcode-project/Megatron-LM