stanford-crfm / mistral

Mistral: A strong, northwesterly wind: Framework for transparent and accessible large-scale language model training, built with Hugging Face 🤗 Transformers.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Mistral Micro Eval Crashes With DeepSpeed

J38 opened this issue · comments

commented

Running the mistral-micro.yaml example, eval crashes. Sample output:

/nlp/scr/jebolton/miniconda3/envs/mistral/lib/python3.8/site-packages/transformers/trainer.py:2543: RuntimeWarning: Mean of empty slice.
  metrics[f"{metric_key_prefix}_loss"] = all_losses.mean().item()
/nlp/scr/jebolton/miniconda3/envs/mistral/lib/python3.8/site-packages/numpy/core/_methods.py:190: RuntimeWarning: invalid value encountered in divide
  ret = ret.dtype.type(ret / rcount)
{'eval_loss': nan, 'eval_runtime': 0.5983, 'eval_samples_per_second': 0.0, 'eval_steps_per_second': 0.0, 'epoch': 0.0}

This is with

deepspeed==0.6.5
torch==1.11.0
transformers==4.18..0
commented

Full command:

deepspeed --num_gpus 8 --num_nodes 1 --master_addr localhost --hostfile hostfile train.py --config conf/mistral-micro.yaml --nnodes 1 --nproc_per_node 8 --training_arguments.fp16 true --training_arguments.per_device_train_batch_size 4 --training_arguments.deepspeed conf/deepspeed/z2-small-conf.json  --run_id mistral-micro-deepspeed-8gpu

fixed by #170