microsoft / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error message in checkpoint_{loader, saver}_megatron.py is usually a red herring

Mittagskogel opened this issue · comments

try:
from megatron.arguments import parse_args, validate_args
from megatron.global_vars import set_args, set_global_variables
from megatron.checkpointing import load_args_from_checkpoint, load_checkpoint
from megatron.model import module
from megatron.core import mpu
from megatron.core.enums import ModelType
from megatron import fused_kernels
except ModuleNotFoundError:
print("Unable to import Megatron, please specify the path to Megatron using --megatron-path. Exiting.")
queue.put("exit")
exit(1)

The problem here is usually not related to finding the megatron directory, but rather outdated megatron or missing subdependencies. Please remove the error message or at least print the underlying error and suggest specifying the megatron directory, instead of assuming that this is the root of the issue.