bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

hello, I meet a problem

etoilestar opened this issue · comments

hello, when I run script to train gpt model,I meet an assertion error:Not sure how to proceed, we were given deepspeed configs in the deepspeed arguments and deepspeed. the script I used is https://github.com/bigscience-workshop/Megatron-DeepSpeed#deepspeed-pp-and-zero-dp. can you tell me why?

Can you please share the assertion message and stack trace?

ok, I will have a try. on the other hand, I cannot find BF16Optimizer mentioned at https://huggingface.co/blog/zh/bloom-megatron-deepspeed#bf16optimizer, could you give me some tips?

commented

I met the same problem when I was following the "start_fast.md".I want to know how to solve the question,Thank you!

comment line 429 args=args in megatron/training.py will solve this problem.

model, optimizer, _, lr_scheduler = deepspeed.initialize(
    model=model[0],
    optimizer=optimizer,
    lr_scheduler=lr_scheduler,
    config=config,
    #args=args,
)

deepspeed.initialize can't be given both config and args.deepspeed_config, you should remove one of them.