Make Pipeline Parallelism Optional
XinDongol opened this issue · comments
Xin (Simon) Dong commented
When build a customized model, do we need to make sure all blocks are PipelineBlock
?
Was trying to build a model with only data parallism and tensor parallism but got error in line thresholds = [block_cumulative_costs[-1] * ((rank + 1) / pp_size) for rank in range(pp_size)]
def build_model(
model_builder: Callable[[], NanotronModel],
parallel_context: ParallelContext,
dtype: torch.dtype,
target_pp_ranks: Optional[List[int]] = None,
device: Optional[torch.device] = torch.device("cuda"),
) -> NanotronModel:
"""Build the model and set the pp ranks for each pipeline block."""
# TODO: classes dont take same args
log_rank("Building model..", logger=logger, level=logging.INFO, rank=0, group=parallel_context.world_pg)
model: NanotronModel = model_builder()
# If no target pp ranks are specified, we assume that we want to use all pp ranks
if target_pp_ranks is None:
pp_size = parallel_context.pp_pg.size()
target_pp_ranks = list(range(pp_size))
else:
pp_size = len(target_pp_ranks)
# Set rank for each pipeline block
log_rank("Setting PP block ranks...", logger=logger, level=logging.INFO, rank=0, group=parallel_context.world_pg)
pipeline_blocks = [module for name, module in model.named_modules() if isinstance(module, PipelineBlock)]
# "cuda" is already defaulted for each process to it's own cuda device
with init_on_device_and_dtype(device=device, dtype=dtype):
# TODO: https://github.com/huggingface/nanotron/issues/65
# Balance compute across PP blocks
block_compute_costs = model.get_block_compute_costs()
block_cumulative_costs = np.cumsum(
[
block_compute_costs[module.module_builder] if module.module_builder in block_compute_costs else 0
for module in pipeline_blocks
]
)
thresholds = [block_cumulative_costs[-1] * ((rank + 1) / pp_size) for rank in range(pp_size)]
assert thresholds[-1] >= block_cumulative_costs[-1]
target_pp_rank_idx = 0
for block, cumulative_cost in zip(pipeline_blocks, block_cumulative_costs):
assert target_pp_rank_idx < pp_size
block.build_and_set_rank(target_pp_ranks[target_pp_rank_idx])
if cumulative_cost > thresholds[target_pp_rank_idx]:
target_pp_rank_idx += 1
model.input_pp_rank = target_pp_ranks[0]
model.output_pp_rank = target_pp_ranks[target_pp_rank_idx]
return model
XλRI-U5 commented
Hello. Yes, you can still build a model without PipelineBlock, but you'll need to make some modifications because the current version of Nanotron's Trainer isn't designed for this
For example, you could bypass the build_model function [link], and directly initialize the model weights.