laekov / fastmoe

A fast MoE impl for PyTorch

Home Page:https://fastmoe.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

'Namespace' object has no attribute 'balance_strategy'

Irenehere opened this issue · comments

Describe the bug
I am using the "megatron-patch" branch (as the main branch cannot be installed successfully). And I modified the "pretrain_bert.py" script as the readme file in fastmoe/examples/megatron. And when I run the pretrain_bert.py script in megatron, I come up with the following error showing that I don't have "balance_strategy" in the arguments. Where should I add this argument?

Logs

[after megatron is initialized] datetime: 2022-09-09 07:46:46
building BERT model ...
Traceback (most recent call last):
File "pretrain_bert.py", line 154, in
pretrain(train_valid_test_datasets_provider, model_provider, forward_step, args_defaults={'tokenizer_type': 'BertWordPieceLowerCase'})
File "/home/jovyan/data/Megatron-LM-2.2/megatron/training.py", line 108, in pretrain
model, optimizer, lr_scheduler = setup_model_and_optimizer(model_provider)
File "/home/jovyan/data/Megatron-LM-2.2/megatron/training.py", line 268, in setup_model_and_optimizer
model = get_model(model_provider_func)
File "/home/jovyan/data/Megatron-LM-2.2/megatron/training.py", line 184, in get_model
model = model_provider_func()
File "pretrain_bert.py", line 55, in model_provider
model = fmoefy(model, num_experts=4)
File "/opt/conda/lib/python3.8/site-packages/fastmoe-0.1.2-py3.8-linux-x86_64.egg/fmoe/megatron/layers.py", line 183, in fmoefy
l.mlp = MegatronMLP(args, mpu.get_model_parallel_group(), idx)
File "/opt/conda/lib/python3.8/site-packages/fastmoe-0.1.2-py3.8-linux-x86_64.egg/fmoe/megatron/layers.py", line 87, in init
if not args.balance_strategy or args.balance_strategy == "gshard":
AttributeError: 'Namespace' object has no attribute 'balance_strategy'

Platform

  • Device: [NVIDIA V100]
  • CUDA version: [11.1]
  • NCCL version: [2.8.0]
  • PyTorch version: [1.8.0]

Can you please check if _add_fmoe_args is properly added to your megatron/arguments.py?

Solved. Thanks for your help.