EleutherAI / oslo

OSLO: Open Source for Large-scale Optimization

https://oslo.eleuther.ai

Lazy Parallelization

hyunwoongko opened this issue a year ago · comments

Kevin Ko commented a year ago

Describe a TODO feature

Lazy Parallelization when oslo.ready is called.
This is for Pipeline Parallelism with Tensor Parallelism because tensor parallelization should be performed earlier.

Brief design

model = ...
model = PipelineParallel(model)  # --> we only add _PipelineParallel wrapper to model.oslo_wrappers dictionary, but not really parallelize.
model = TensorParallel(model)  # --> same with above

oslo.ready(model)  # --> we can parallelize tensor -> pipeline

class _TensorParallelism
    def __init__(self):
        self.oslo_parallel_priority = 1

class _PipelineParallelism
    def __init__(self):
        self.oslo_parallel_priority = 0

and we can sort parallel wrappers by this variable.
what do you think about this? @ohwi @bzantium @jason9693

Assignees

@hyunwoongko