用自己的数据集微调时会出现下面的报错，但是用官方的yi_example数据集就不会出现报错，请问这是为什么？

Question

用自己的数据集微调时会出现下面的报错，但是用官方的yi_example数据集就不会出现报错，请问这是为什么？

Elbaz-k opened this issue 4 months ago · comments

Reminder

I have searched the Github Discussion and issues and have not found anything similar to this.

Environment

- OS:
- Python:3.10
- PyTorch:2.0.1+cu117
- CUDA:11.7

Current Behavior

在官方的数据集上微调不会出现报错，但是在自己的数据集上会出现报错，报错具体信息在下面

Expected Behavior

No response

Steps to Reproduce

在我自己构建的数据集上进行微调会出现以下报错：
, '--per_device_eval_batch_size', '1', '--max_seq_len', '4096', '--learning_rate', '2e-6', '--weight_decay', '0.', '--num_train_epochs', '4', '--training_debug_steps', '20', '--gradient_accumulation_steps', '1', '--lr_scheduler_type', 'cosine', '--num_warmup_steps', '0', '--seed', '1234', '--gradient_checkpointing', '--zero_stage', '2', '--deepspeed', '--offload', '--output_dir', '/root/vision/Yi-main/Yi-main/finetuned_model']
[2024-06-05 16:42:02,475] [INFO] [launch.py:256:main] process 2103326 spawned with command: ['/root/vision/anaconda3/envs/Yi/bin/python', '-u', 'main.py', '--local_rank=3', '--data_path', '/root/vision/Yi-main/Yi-main/finetune/yi_dataset', '--model_name_or_path', '/root/vision/Yi-main/Yi-main/checkpoint/Yi-6B-base', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--max_seq_len', '4096', '--learning_rate', '2e-6', '--weight_decay', '0.', '--num_train_epochs', '4', '--training_debug_steps', '20', '--gradient_accumulation_steps', '1', '--lr_scheduler_type', 'cosine', '--num_warmup_steps', '0', '--seed', '1234', '--gradient_checkpointing', '--zero_stage', '2', '--deepspeed', '--offload', '--output_dir', '/root/vision/Yi-main/Yi-main/finetuned_model']
[2024-06-05 16:42:02,476] [INFO] [launch.py:256:main] process 2103327 spawned with command: ['/root/vision/anaconda3/envs/Yi/bin/python', '-u', 'main.py', '--local_rank=4', '--data_path', '/root/vision/Yi-main/Yi-main/finetune/yi_dataset', '--model_name_or_path', '/root/vision/Yi-main/Yi-main/checkpoint/Yi-6B-base', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--max_seq_len', '4096', '--learning_rate', '2e-6', '--weight_decay', '0.', '--num_train_epochs', '4', '--training_debug_steps', '20', '--gradient_accumulation_steps', '1', '--lr_scheduler_type', 'cosine', '--num_warmup_steps', '0', '--seed', '1234', '--gradient_checkpointing', '--zero_stage', '2', '--deepspeed', '--offload', '--output_dir', '/root/vision/Yi-main/Yi-main/finetuned_model']
[2024-06-05 16:42:02,477] [INFO] [launch.py:256:main] process 2103328 spawned with command: ['/root/vision/anaconda3/envs/Yi/bin/python', '-u', 'main.py', '--local_rank=5', '--data_path', '/root/vision/Yi-main/Yi-main/finetune/yi_dataset', '--model_name_or_path', '/root/vision/Yi-main/Yi-main/checkpoint/Yi-6B-base', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--max_seq_len', '4096', '--learning_rate', '2e-6', '--weight_decay', '0.', '--num_train_epochs', '4', '--training_debug_steps', '20', '--gradient_accumulation_steps', '1', '--lr_scheduler_type', 'cosine', '--num_warmup_steps', '0', '--seed', '1234', '--gradient_checkpointing', '--zero_stage', '2', '--deepspeed', '--offload', '--output_dir', '/root/vision/Yi-main/Yi-main/finetuned_model']
[2024-06-05 16:42:02,478] [INFO] [launch.py:256:main] process 2103329 spawned with command: ['/root/vision/anaconda3/envs/Yi/bin/python', '-u', 'main.py', '--local_rank=6', '--data_path', '/root/vision/Yi-main/Yi-main/finetune/yi_dataset', '--model_name_or_path', '/root/vision/Yi-main/Yi-main/checkpoint/Yi-6B-base', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--max_seq_len', '4096', '--learning_rate', '2e-6', '--weight_decay', '0.', '--num_train_epochs', '4', '--training_debug_steps', '20', '--gradient_accumulation_steps', '1', '--lr_scheduler_type', 'cosine', '--num_warmup_steps', '0', '--seed', '1234', '--gradient_checkpointing', '--zero_stage', '2', '--deepspeed', '--offload', '--output_dir', '/root/vision/Yi-main/Yi-main/finetuned_model']
[2024-06-05 16:42:02,479] [INFO] [launch.py:256:main] process 2103330 spawned with command: ['/root/vision/anaconda3/envs/Yi/bin/python', '-u', 'main.py', '--local_rank=7', '--data_path', '/root/vision/Yi-main/Yi-main/finetune/yi_dataset', '--model_name_or_path', '/root/vision/Yi-main/Yi-main/checkpoint/Yi-6B-base', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--max_seq_len', '4096', '--learning_rate', '2e-6', '--weight_decay', '0.', '--num_train_epochs', '4', '--training_debug_steps', '20', '--gradient_accumulation_steps', '1', '--lr_scheduler_type', 'cosine', '--num_warmup_steps', '0', '--seed', '1234', '--gradient_checkpointing', '--zero_stage', '2', '--deepspeed', '--offload', '--output_dir', '/root/vision/Yi-main/Yi-main/finetuned_model']
[2024-06-05 16:42:04,420] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-05 16:42:04,492] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[2024-06-05 16:42:04,508] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[2024-06-05 16:42:04,578] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-05 16:42:04,578] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-05 16:42:04,584] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-06-05 16:42:04,593] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[2024-06-05 16:42:04,672] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0
[WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0
[WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0
[WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0
[WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0
[WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0
[WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0
[WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0
[WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible
/root/vision/anaconda3/envs/Yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-06-05 16:42:06,483] [INFO] [comm.py:637:init_distributed] cdb=None
/root/vision/anaconda3/envs/Yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-06-05 16:42:06,508] [INFO] [comm.py:637:init_distributed] cdb=None
/root/vision/anaconda3/envs/Yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-06-05 16:42:06,524] [INFO] [comm.py:637:init_distributed] cdb=None
/root/vision/anaconda3/envs/Yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-06-05 16:42:06,588] [INFO] [comm.py:637:init_distributed] cdb=None
/root/vision/anaconda3/envs/Yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-06-05 16:42:06,598] [INFO] [comm.py:637:init_distributed] cdb=None
/root/vision/anaconda3/envs/Yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-06-05 16:42:06,612] [INFO] [comm.py:637:init_distributed] cdb=None
/root/vision/anaconda3/envs/Yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-06-05 16:42:07,062] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-06-05 16:42:07,062] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
/root/vision/anaconda3/envs/Yi/lib/python3.10/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-06-05 16:42:07,114] [INFO] [comm.py:637:init_distributed] cdb=None
tokenizer path existtokenizer path existtokenizer path exist

tokenizer path exist
tokenizer path exist
tokenizer path existtokenizer path existtokenizer path exist

The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2" instead.
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2" instead.
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2" instead.
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda').
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda').
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda').
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2" instead.
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda').
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2" instead.
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2" instead.
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda').
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda').
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2" instead.
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2" instead.
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda').
You are attempting to use Flash Attention 2.0 with a model not initialized on GPU. Make sure to move the model to GPU after initializing it on CPU with model.to('cuda').
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaForCausalLM is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)
Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes, but the current dype in LlamaModel is torch.float32. You should run training or inference using Automatic Mixed-Precision via the with torch.autocast(device_type='torch_device'): decorator, or load the model with the torch_dtype argument. Example: model = AutoModel.from_pretrained("openai/whisper-tiny", attn_implementation="flash_attention_2", torch_dtype=torch.float16)
Loading checkpoint shards: 100%|████████████████| 2/2 [00:10<00:00, 5.05s/it]
Loading checkpoint shards: 50%|████████ | 1/2 [00:10<00:10, 10.15s/it]length of tokenizer is 64000
resize_token_embeddings is 64000
Loading checkpoint shards: 100%|████████████████| 2/2 [00:10<00:00, 5.49s/it]
length of tokenizer is 64000
Loading checkpoint shards: 100%|████████████████| 2/2 [00:11<00:00, 5.73s/it]
Loading checkpoint shards: 100%|████████████████| 2/2 [00:11<00:00, 5.74s/it]
resize_token_embeddings is 64000
Loading checkpoint shards: 100%|████████████████| 2/2 [00:11<00:00, 5.70s/it]
Loading checkpoint shards: 100%|████████████████| 2/2 [00:11<00:00, 5.72s/it]
Loading checkpoint shards: 100%|████████████████| 2/2 [00:11<00:00, 5.73s/it]
Loading checkpoint shards: 100%|████████████████| 2/2 [00:11<00:00, 5.70s/it]
length of tokenizer is 64000
length of tokenizer is 64000
length of tokenizer is 64000
length of tokenizer is 64000
resize_token_embeddings is 64000
resize_token_embeddings is 64000
length of tokenizer is 64000
length of tokenizer is 64000
resize_token_embeddings is 64000
resize_token_embeddings is 64000
resize_token_embeddings is 64000
resize_token_embeddings is 64000
Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.536935567855835 seconds
Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.6319587230682373 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.6258392333984375 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.648719310760498 seconds
Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Using /root/.cache/torch_extensions/py310_cu117 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py310_cu117/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.706559419631958 seconds
Loading extension module cpu_adam...
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.735806703567505 seconds
Time to load cpu_adam op: 2.735208511352539 seconds
Loading extension module cpu_adam...
Time to load cpu_adam op: 2.7772958278656006 seconds
Adam Optimizer #0 is created with AVX512 arithmetic capability.
Config: alpha=0.000002, betas=(0.900000, 0.950000), weight_decay=0.000000, adam_w=1
Adam Optimizer #0 is created with AVX512 arithmetic capability.
Config: alpha=0.000002, betas=(0.900000, 0.950000), weight_decay=0.000000, adam_w=1
Adam Optimizer #0 is created with AVX512 arithmetic capability.
Config: alpha=0.000002, betas=(0.900000, 0.950000), weight_decay=0.000000, adam_w=1
[2024-06-05 16:42:26,674] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.14.2, git-hash=unknown, git-branch=unknown
[2024-06-05 16:42:26,674] [INFO] [comm.py:662:init_distributed] Distributed backend already initialized
Adam Optimizer #0 is created with AVX512 arithmetic capability.
Config: alpha=0.000002, betas=(0.900000, 0.950000), weight_decay=0.000000, adam_w=1
Adam Optimizer #0 is created with AVX512 arithmetic capability.
Config: alpha=0.000002, betas=(0.900000, 0.950000), weight_decay=0.000000, adam_w=1
Adam Optimizer #0 is created with AVX512 arithmetic capability.
Config: alpha=0.000002, betas=(0.900000, 0.950000), weight_decay=0.000000, adam_w=1
Adam Optimizer #0 is created with AVX512 arithmetic capability.
Config: alpha=0.000002, betas=(0.900000, 0.950000), weight_decay=0.000000, adam_w=1
Adam Optimizer #0 is created with AVX512 arithmetic capability.
Config: alpha=0.000002, betas=(0.900000, 0.950000), weight_decay=0.000000, adam_w=1
[2024-06-05 16:42:50,655] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
[2024-06-05 16:42:50,656] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer
[2024-06-05 16:42:50,656] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
[2024-06-05 16:42:50,665] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = DeepSpeedCPUAdam
[2024-06-05 16:42:50,665] [INFO] [utils.py:56:is_zero_supported_optimizer] Checking ZeRO support for optimizer=DeepSpeedCPUAdam type=<class 'deepspeed.ops.adam.cpu_adam.DeepSpeedCPUAdam'>
[2024-06-05 16:42:50,665] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer
[2024-06-05 16:42:50,666] [INFO] [stage_1_and_2.py:148:init] Reduce bucket size 500,000,000
[2024-06-05 16:42:50,666] [INFO] [stage_1_and_2.py:149:init] Allgather bucket size 500,000,000
[2024-06-05 16:42:50,666] [INFO] [stage_1_and_2.py:150:init] CPU Offload: True
[2024-06-05 16:42:50,666] [INFO] [stage_1_and_2.py:151:init] Round robin gradient partitioning: False
Traceback (most recent call last):
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 415, in
main()
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 362, in main
perplexity = evaluation(model, eval_dataloader)
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 313, in evaluation
losses = losses / (step + 1)
UnboundLocalError: local variable 'step' referenced before assignment
Traceback (most recent call last):
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 415, in
main()
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 362, in main
perplexity = evaluation(model, eval_dataloader)
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 313, in evaluation
losses = losses / (step + 1)
UnboundLocalError: local variable 'step' referenced before assignment
Traceback (most recent call last):
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 415, in
main()
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 362, in main
perplexity = evaluation(model, eval_dataloader)
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 313, in evaluation
losses = losses / (step + 1)
UnboundLocalError: local variable 'step' referenced before assignment
Traceback (most recent call last):
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 415, in
main()
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 362, in main
perplexity = evaluation(model, eval_dataloader)
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 313, in evaluation
losses = losses / (step + 1)
UnboundLocalError: local variable 'step' referenced before assignment
Traceback (most recent call last):
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 415, in
main()
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 362, in main
perplexity = evaluation(model, eval_dataloader)
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 313, in evaluation
losses = losses / (step + 1)
UnboundLocalError: local variable 'step' referenced before assignment
[2024-06-05 16:43:20,446] [INFO] [utils.py:779:see_memory_usage] Before initializing optimizer states
[2024-06-05 16:43:20,447] [INFO] [utils.py:780:see_memory_usage] MA 11.78 GB Max_MA 11.78 GB CA 11.78 GB Max_CA 12 GB
[2024-06-05 16:43:20,447] [INFO] [utils.py:787:see_memory_usage] CPU Virtual Memory: used = 119.32 GB, percent = 15.8%
[2024-06-05 16:43:20,712] [INFO] [utils.py:779:see_memory_usage] After initializing optimizer states
[2024-06-05 16:43:20,712] [INFO] [utils.py:780:see_memory_usage] MA 11.78 GB Max_MA 11.78 GB CA 11.78 GB Max_CA 12 GB
[2024-06-05 16:43:20,713] [INFO] [utils.py:787:see_memory_usage] CPU Virtual Memory: used = 121.65 GB, percent = 16.1%
[2024-06-05 16:43:20,713] [INFO] [stage_1_and_2.py:543:init] optimizer state initialized
Traceback (most recent call last):
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 415, in
main()
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 362, in main
perplexity = evaluation(model, eval_dataloader)
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 313, in evaluation
losses = losses / (step + 1)
UnboundLocalError: local variable 'step' referenced before assignment
[2024-06-05 16:43:20,823] [INFO] [utils.py:779:see_memory_usage] After initializing ZeRO optimizer
[2024-06-05 16:43:20,824] [INFO] [utils.py:780:see_memory_usage] MA 11.78 GB Max_MA 11.78 GB CA 11.78 GB Max_CA 12 GB
[2024-06-05 16:43:20,824] [INFO] [utils.py:787:see_memory_usage] CPU Virtual Memory: used = 122.85 GB, percent = 16.3%
[2024-06-05 16:43:20,826] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedCPUAdam
[2024-06-05 16:43:20,826] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler
[2024-06-05 16:43:20,826] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = <torch.optim.lr_scheduler.LambdaLR object at 0x7f99a3130df0>
[2024-06-05 16:43:20,826] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[2e-06], mom=[(0.9, 0.95)]
[2024-06-05 16:43:20,827] [INFO] [config.py:996:print] DeepSpeedEngine configuration:
[2024-06-05 16:43:20,827] [INFO] [config.py:1000:print] activation_checkpointing_config {
"partition_activations": false,
"contiguous_memory_optimization": false,
"cpu_checkpointing": false,
"number_checkpoints": null,
"synchronize_checkpoint_boundary": false,
"profile": false
}
[2024-06-05 16:43:20,827] [INFO] [config.py:1000:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}
[2024-06-05 16:43:20,827] [INFO] [config.py:1000:print] amp_enabled .................. False
[2024-06-05 16:43:20,827] [INFO] [config.py:1000:print] amp_params ................... False
[2024-06-05 16:43:20,827] [INFO] [config.py:1000:print] autotuning_config ............ {
"enabled": false,
"start_step": null,
"end_step": null,
"metric_path": null,
"arg_mappings": null,
"metric": "throughput",
"model_info": null,
"results_dir": "autotuning_results",
"exps_dir": "autotuning_exps",
"overwrite": true,
"fast": true,
"start_profile_step": 3,
"end_profile_step": 5,
"tuner_type": "gridsearch",
"tuner_early_stopping": 5,
"tuner_num_trials": 50,
"model_info_path": null,
"mp_size": 1,
"max_train_batch_size": null,
"min_train_batch_size": 1,
"max_train_micro_batch_size_per_gpu": 1.024000e+03,
"min_train_micro_batch_size_per_gpu": 1,
"num_tuning_micro_batch_sizes": 3
}
[2024-06-05 16:43:20,827] [INFO] [config.py:1000:print] bfloat16_enabled ............. False
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] bfloat16_immediate_grad_update False
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] checkpoint_parallel_write_pipeline False
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] checkpoint_tag_validation_enabled True
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] checkpoint_tag_validation_fail False
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] comms_config ................. <deepspeed.comm.config.DeepSpeedCommsConfig object at 0x7f99a3131c30>
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] communication_data_type ...... None
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] compile_config ............... enabled=False backend='inductor' kwargs={}
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] curriculum_enabled_legacy .... False
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] curriculum_params_legacy ..... False
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] data_efficiency_enabled ...... False
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] dataloader_drop_last ......... False
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] disable_allgather ............ False
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] dump_state ................... False
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'consecutive_hysteresis': False, 'min_scale': 1}
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] eigenvalue_enabled ........... False
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] eigenvalue_gas_boundary_resolution 1
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] eigenvalue_layer_name ........ bert.encoder.layer
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] eigenvalue_layer_num ......... 0
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] eigenvalue_max_iter .......... 100
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] eigenvalue_stability ......... 1e-06
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] eigenvalue_tol ............... 0.01
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] eigenvalue_verbose ........... False
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] elasticity_enabled ........... False
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] flops_profiler_config ........ {
"enabled": false,
"recompute_fwd_factor": 0.0,
"profile_step": 1,
"module_depth": -1,
"top_modules": 1,
"detailed": true,
"output_file": null
}
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] fp16_auto_cast ............... False
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] fp16_enabled ................. True
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] fp16_master_weights_and_gradients False
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] global_rank .................. 0
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] grad_accum_dtype ............. None
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] gradient_accumulation_steps .. 1
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] gradient_clipping ............ 1.0
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] gradient_predivide_factor .... 1.0
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] graph_harvesting ............. False
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] initial_dynamic_scale ........ 65536
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] load_universal_checkpoint .... False
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] loss_scale ................... 0
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] memory_breakdown ............. False
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] mics_hierarchial_params_gather False
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] mics_shard_size .............. -1
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='sft_tensorboard/ds_tensorboard_logs/', job_name='sft_tensorboard') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] nebula_config ................ {
"enabled": false,
"persistent_storage_path": null,
"persistent_time_interval": 100,
"num_of_version_in_retention": 2,
"enable_nebula_load": true,
"load_path": null
}
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] optimizer_legacy_fusion ...... False
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] optimizer_name ............... None
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] optimizer_params ............. None
[2024-06-05 16:43:20,828] [INFO] [config.py:1000:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0, 'pipe_partitioned': True, 'grad_partitioned': True}
[2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] pld_enabled .................. False
[2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] pld_params ................... False
[2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] prescale_gradients ........... False
[2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] scheduler_name ............... None
[2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] scheduler_params ............. None
[2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] seq_parallel_communication_data_type torch.float32
[2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] sparse_attention ............. None
[2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] sparse_gradients_enabled ..... False
[2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] steps_per_print .............. 10
[2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] train_batch_size ............. 8
[2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] train_micro_batch_size_per_gpu 1
[2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] use_data_before_expert_parallel_ False
[2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] use_node_local_storage ....... False
[2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] wall_clock_breakdown ......... False
[2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] weight_quantization_config ... None
[2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] world_size ................... 8
[2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] zero_allow_untested_optimizer False
[2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=500,000,000 use_multi_rank_bucket_allreduce=True allgather_partitions=True allgather_bucket_size=500,000,000 overlap_comm=False load_from_fp32_weights=True elastic_checkpoint=False offload_param=DeepSpeedZeroOffloadParamConfig(device='cpu', nvme_path=None, buffer_count=5, buffer_size=100,000,000, max_in_cpu=1,000,000,000, pin_memory=False) offload_optimizer=DeepSpeedZeroOffloadOptimizerConfig(device='cpu', nvme_path=None, buffer_count=4, pin_memory=False, pipeline=False, pipeline_read=False, pipeline_write=False, fast_init=False, ratio=1.0) sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=30000000 param_persistence_threshold=10000 model_persistence_threshold=sys.maxsize max_live_parameters=30000000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False zero_hpz_partition_size=1 zero_quantized_weights=False zero_quantized_nontrainable_weights=False zero_quantized_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=False pipeline_loading_checkpoint=False override_module_apply=True
[2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] zero_enabled ................. True
[2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] zero_force_ds_cpu_optimizer .. True
[2024-06-05 16:43:20,829] [INFO] [config.py:1000:print] zero_optimization_stage ...... 2
[2024-06-05 16:43:20,829] [INFO] [config.py:986:print_user_config] json = {
"train_batch_size": 8,
"train_micro_batch_size_per_gpu": 1,
"steps_per_print": 10,
"zero_optimization": {
"stage": 2,
"offload_param": {
"device": "cpu"
},
"offload_optimizer": {
"device": "cpu"
},
"stage3_param_persistence_threshold": 1.000000e+04,
"stage3_max_live_parameters": 3.000000e+07,
"stage3_prefetch_bucket_size": 3.000000e+07,
"memory_efficient_linear": false
},
"fp16": {
"enabled": true,
"loss_scale_window": 100
},
"gradient_clipping": 1.0,
"prescale_gradients": false,
"wall_clock_breakdown": false,
"hybrid_engine": {
"enabled": false,
"max_out_tokens": 512,
"inference_tp_size": 1,
"release_inference_cache": false,
"pin_parameters": true,
"tp_gather_partition_size": 8
},
"tensorboard": {
"enabled": false,
"output_path": "sft_tensorboard/ds_tensorboard_logs/",
"job_name": "sft_tensorboard"
}
}
***** Running training *****
***** Evaluating perplexity, Epoch 0/4 *****
Traceback (most recent call last):
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 415, in
main()
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 362, in main
perplexity = evaluation(model, eval_dataloader)
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 313, in evaluation
losses = losses / (step + 1)
UnboundLocalError: local variable 'step' referenced before assignment
[2024-06-05 16:43:21,571] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2103323
[2024-06-05 16:43:25,215] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2103324
[2024-06-05 16:43:25,216] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2103325
[2024-06-05 16:43:25,242] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2103326
[2024-06-05 16:43:26,191] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2103327
[2024-06-05 16:43:26,215] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2103328
[2024-06-05 16:43:26,228] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2103329
[2024-06-05 16:43:26,240] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 2103330
[2024-06-05 16:43:26,251] [ERROR] [launch.py:325:sigkill_handler] ['/root/vision/anaconda3/envs/Yi/bin/python', '-u', 'main.py', '--local_rank=7', '--data_path', '/root/vision/Yi-main/Yi-main/finetune/yi_dataset', '--model_name_or_path', '/root/vision/Yi-main/Yi-main/checkpoint/Yi-6B-base', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--max_seq_len', '4096', '--learning_rate', '2e-6', '--weight_decay', '0.', '--num_train_epochs', '4', '--training_debug_steps', '20', '--gradient_accumulation_steps', '1', '--lr_scheduler_type', 'cosine', '--num_warmup_steps', '0', '--seed', '1234', '--gradient_checkpointing', '--zero_stage', '2', '--deepspeed', '--offload', '--output_dir', '/root/vision/Yi-main/Yi-main/finetuned_model'] exits with return code = 1
运行的脚本是：
#/usr/bin/env bash

cd "$(dirname "${BASH_SOURCE[0]}")/../sft/"

deepspeed main.py
--data_path /root/vision/Yi-main/Yi-main/finetune/yi_dataset
--model_name_or_path /root/vision/Yi-main/Yi-main/checkpoint/Yi-6B-base
--per_device_train_batch_size 1
--per_device_eval_batch_size 1
--max_seq_len 4096
--learning_rate 2e-6
--weight_decay 0.
--num_train_epochs 4
--training_debug_steps 20
--gradient_accumulation_steps 1
--lr_scheduler_type cosine
--num_warmup_steps 0
--seed 1234
--gradient_checkpointing
--zero_stage 2
--deepspeed
--offload
--output_dir /root/vision/Yi-main/Yi-main/finetuned_model
但是我把数据集换成官方的yi_example_dataset就可以成功微调，但是在自己的数据集上就会出现这个问题：Traceback (most recent call last):
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 415, in
main()
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 362, in main
perplexity = evaluation(model, eval_dataloader)
File "/root/vision/Yi-main/Yi-main/finetune/sft/main.py", line 313, in evaluation
losses = losses / (step + 1)
UnboundLocalError: local variable 'step' referenced before assignment
请问这是为什么？

Anything Else?

No response

Haijian Wang · Answer 1 · Thu Jul 25 2024 15:48:17 GMT+0800 (China Standard Time)

Hi Elbaz-k👋, So far it looks like it's a matter of fine-tuning the framework's data conversion, you can go read the official documentation on the framework's support for dataset formats.