EleutherAI / oslo

OSLO: Open Source for Large-scale Optimization

Home Page:https://oslo.eleuther.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

'TrainingArguments' object has no attribute 'parallel_mode' when running mBart test

josemlopez opened this issue · comments

How to reproduce

python ./tests/transformers/models/mbart/test_training.py

Environment

  • OS : CentOS 7.9
  • Python version : 3.9
  • Transformers version : 4.21.2
  • Whether to use Docker:
  • Misc.:
python ./tests/transformers/models/mbart/test_training.py Reusing dataset glue (/root/.cache/huggingface/datasets/glue/sst2/1.0.0/dacbe3125aa31d7f70367a07a8a9e72a5a0bfeb5fc42e75c9db75b96da6053ad)
100%|███████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 682.67it/s]
100%|██████████████████████████████████████████████████████████████████████████████| 68/68 [00:01<00:00, 52.15ba/s]
100%|████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 43.15ba/s]
100%|████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 42.94ba/s]
You are using a model of type bart to instantiate a model of type mbart. This is not supported for all configurations of models and can yield errors.
Some weights of MBartForConditionalGeneration were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['encoder.layer_norm.bias', 'decoder.layer_norm.weight', 'encoder.layer_norm.weight', 'decoder.layer_norm.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
You are using a model of type bart to instantiate a model of type mbart. This is not supported for all configurations of models and can yield errors.
Some weights of MBartForConditionalGeneration were not initialized from the model checkpoint at facebook/bart-base and are newly initialized: ['encoder.layer_norm.bias', 'decoder.layer_norm.weight', 'encoder.layer_norm.weight', 'decoder.layer_norm.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
PyTorch: setting up devices
Traceback (most recent call last):
  File "./tests/transformers/models/mbart/test_training.py", line 94, in <module>
    fp16=False,
  File "./tests/transformers/models/mbart/test_training.py", line 44, in train
    eval_dataset=dataset["validation"],
  File "/opt/conda/lib/python3.7/site-packages/oslo_core-3.0.0-py3.7.egg/oslo/transformers/trainer.py", line 186, in __init__
    if len(args.parallel_mode) > 0:
AttributeError: 'TrainingArguments' object has no attribute 'parallel_mode'

The problem seems to be the parallel_mode property in training_args.py is commented, line 989

# @property # def parallel_mode(self): # """ # The current mode used for parallelism if multiple GPUs/TPU cores are available. One of: # # -ParallelMode.NOT_PARALLEL: no parallelism (CPU or one GPU). # - ParallelMode.NOT_DISTRIBUTED: several GPUs in one single process (uses torch.nn.DataParallel). # - ParallelMode.DISTRIBUTED: several GPUs, each having its own process (uses # torch.nn.DistributedDataParallel). # - ParallelMode.TPU: several TPU cores. # """ # # if is_torch_tpu_available(): # # return ParallelMode.TPU # # elif is_sagemaker_mp_enabled(): # # return ParallelMode.SAGEMAKER_MODEL_PARALLEL # # elif is_sagemaker_dp_enabled(): # # return ParallelMode.SAGEMAKER_DATA_PARALLEL # if self.local_rank != -1: # return ParallelMode.DISTRIBUTED # elif self.n_gpu > 1: # return ParallelMode.NOT_DISTRIBUTED # else: # return ParallelMode.NOT_PARALLEL

currently, the trainer module is not ready to use.
I'll let you know when this becomes available. thanks.