Using anything > 2048 for batch_max_length during training results in cuda index errors

Question

Using anything > 2048 for batch_max_length during training results in cuda index errors

corey-lambda opened this issue 3 months ago · comments

This is on a machine with 8 A100 with 80gb each.

Dataset is https://huggingface.co/datasets/openchat/openchat_sharegpt4_dataset/resolve/main/sharegpt_clean.json, converted to the conversation format as indicated in the readme, and then tokenized using the command:

python -m ochat.data.generate_dataset --model-type openchat_v3.2 --model-path imone/LLaMA2_7B_with_EOT_token --in-files sharegpt_clean.jsonl --out-prefix .

Training command used:

deepspeed --num_gpus=8 --module ochat.training_deepspeed.train \
          --model_path imone/LLaMA2_7B_with_EOT_token \
          --data_prefix ./data/ \
          --save_path ./checkpoints/llama2-7b/ \
          --batch_max_len 4096 \
          --epochs 5 \
          --save_every 1 \
          --deepspeed \
          --deepspeed_config deepspeed_config.json \
          > info.log \
          2> error.log

Here are the stdio & stderr log files when i run the above:
error.log
info.log

corey-lambda · Answer 1 · Fri Mar 08 2024 01:09:59 GMT+0800 (China Standard Time)

Update: The recommended batch_max_length works if you select --model_path imone/Mistral_7B_with_EOT_token