bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Is there any script for pretraining/funting Bloom?

drxmy opened this issue · comments

commented

Specially, I am looking a script with Deepspeed PP and ZeRO-DP like this https://github.com/bigscience-workshop/Megatron-DeepSpeed/tree/bitfit#deepspeed-pp-and-zero-dp

In my understanding, this script should be able to load bloom with some change, for example add "--position-embedding-type alibi" . I have done some experiment, but it keeps failing.

Really appreciated it if someone could give me some advice!