PaddleDDPMachine does not call DataParallel.forward but Machine.forward, causing error in DDP training
rudaoshi opened this issue · comments
孙明明 commented
In the SuperviseOperator, self.machine.forward_with_validation
is called. However, when the machine is PaddleDDPMachine
, accoding to following code, getattr(self.module, forward_with_validation)
will be called, then the calling self
become the original machine, not the DataParallel
wrapped machine.
class PaddleDDPMachine(paddle.DataParallel):
def __init__(self, machine, *args, **kwargs):
super().__init__(machine, *args, **kwargs)
def __getattr__(self, name):
try:
return paddle.DataParallel.__getattr__(self, name)
except AttributeError as e:
if hasattr(self.module, name):
return getattr(self.module, name)
else:
print(f"what happens {self.__dict__}")
raise e