About convert DS checkpoint to Transformers
misska1 opened this issue · comments
misska1 commented
python tools/convert_checkpoint/deepspeed_to_megatron.py --target_tp 1 --target_pp 1 --input_folder checkpoints/tr11b-1B3-ml/checkpoints/main/global_step1/ --output_folder ./trans_checkpoints
Convert DeepSpeed Checkpoint to Megatron Checkpoint
args = Namespace(for_release=False, input_folder='checkpoints/tr11b-1B3-ml/checkpoints/main/global_step1/', output_folder='./trans_checkpoints', target_pp=1, target_tp=1)
Converting DeepSpeed checkpoint in checkpoints/tr11b-1B3-ml/checkpoints/main/global_step1/ to Megatron checkpoint in ./trans_checkpoints
Traceback (most recent call last):
File "tools/convert_checkpoint/deepspeed_to_megatron.py", line 187, in <module>
main()
File "tools/convert_checkpoint/deepspeed_to_megatron.py", line 173, in main
ds_checkpoint = DeepSpeedCheckpoint(args.input_folder, args.target_tp,
File "/data/anaconda3/envs/ds/lib/python3.8/site-packages/deepspeed/checkpoint/deepspeed_checkpoint.py", line 72, in __init__
self.zero_checkpoint = ZeROCheckpoint(dir)
File "/data/anaconda3/envs/ds/lib/python3.8/site-packages/deepspeed/checkpoint/zero_checkpoint.py", line 26, in __init__
assert self.num_files > 0, f'No ZeRO files found in {dir}'
AssertionError: No ZeRO files found in checkpoints/tr11b-1B3-ml/checkpoints/main/global_step1/
I did not get any zero file while saving checkpoint in pretrain.
飘荡着呢 commented
I get this problem too, how to slove?
Leon Song commented
I met the same problem. I guess the options ZERO_STAGE=0
and --fp16
cannot work together? It cannot generate any ZeRO files.
But I don't know how to solve it.