预训练中deepspeed版本问题
twwch opened this issue · comments
deepspeed train.py --model_name_or_path /public/home/chenhao/models/llama-13b-hf --model_max_length 1024 --data_path ./data/data/data --output_dir ./output --num_train_epochs 1 --per_device_train_batch_size 16 --per_device_eval_batch_size 1 --evaluation_strategy "no" --save_strategy "steps" --save_steps 100 --save_total_limit 1 --learning_rate 1.5e-5 --warmup_steps 300 --logging_steps 1 --report_to "tensorboard" --gradient_checkpointing True --deepspeed configs/config.json --fp16 True --log_on_each_node False --lr_scheduler_type "cosine" --adam_beta1 0.9 --adam_beta2 0.95 --weight_decay 0.1
您好,我们使用的Deepspeed的版本为0.8.3
。下面是我们的环境下执行ds_report
后的结果:
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
utils .................. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/root/anaconda3/envs/llama/lib/python3.10/site-packages/torch']
torch version .................... 1.12.0
deepspeed install path ........... ['/root/anaconda3/envs/llama/lib/python3.10/site-packages/deepspeed']
deepspeed info ................... 0.8.3+4d27225f, 4d27225f, master
torch cuda version ............... 11.3
torch hip version ................ None
nvcc version ..................... 11.6
deepspeed wheel compiled w. ...... torch 1.12, cuda 11.3
这跟deepspeed没关系。。。 这就是AMD的cpu压根不支持。。。。。换intel吧朋友
您好,请您查看一下gcc版本是否低于10,如果高于10,请尝试将gcc版本降为7。