hello, I meet a problem
etoilestar opened this issue · comments
hello, when I run script to train gpt model,I meet an assertion error:Not sure how to proceed, we were given deepspeed configs in the deepspeed arguments and deepspeed. the script I used is https://github.com/bigscience-workshop/Megatron-DeepSpeed#deepspeed-pp-and-zero-dp. can you tell me why?
Can you please share the assertion message and stack trace?
Please try https://github.com/bigscience-workshop/Megatron-DeepSpeed/blob/main/run_bf16.sh or the equivalent run_fp16.sh
ok, I will have a try. on the other hand, I cannot find BF16Optimizer mentioned at https://huggingface.co/blog/zh/bloom-megatron-deepspeed#bf16optimizer, could you give me some tips?
I met the same problem when I was following the "start_fast.md".I want to know how to solve the question,Thank you!
comment line 429 args=args
in megatron/training.py
will solve this problem.
model, optimizer, _, lr_scheduler = deepspeed.initialize(
model=model[0],
optimizer=optimizer,
lr_scheduler=lr_scheduler,
config=config,
#args=args,
)
deepspeed.initialize
can't be given both config
and args.deepspeed_config
, you should remove one of them.