deepcopy exception occurred while running lr_find

csaroff opened this issue · comments

Great library! I had actually independently implemented some of the loss utilities that you wrote. Glad to see them in a public repo with better names.

Unfortunately, when I try to load the packages with from fastxtend.vision.all import *, running learn.lr_find fails because it attempts to deepcopy learn.dls.

Here's the headline of the stacktrace:

PDB Debugger Session

ipdb> up
> /home/csaroff/.miniconda3/lib/python3.9/site-packages/torch/_tensor.py(1279)__torch_function__()
   1278         with _C.DisableTorchFunction():
-> 1279             ret = func(*args, **kwargs)
   1280             if func in get_default_nowrap_functions():
   1281                 return ret

ipdb> up
> /home/csaroff/.miniconda3/lib/python3.9/site-packages/fastai/torch_core.py(372)__torch_function__()
    370         if cls.debug and func.__name__ not in ('__str__','__repr__'): print(func, types, args, kwargs)
    371         if _torch_handled(args, cls._opt, func): types = (torch.Tensor,)
--> 372         res = super().__torch_function__(func, types, args, ifnone(kwargs, {}))
    373         dict_objs = _find_args(args) if args else _find_args(list(kwargs.values()))
    374         if issubclass(type(res),TensorBase) and dict_objs: res.set_meta(dict_objs[0],as_copy=True)

ipdb> up
> /home/csaroff/.miniconda3/lib/python3.9/site-packages/torch/overrides.py(1534)handle_torch_function()
   1532         # Use `public_api` instead of `implementation` so __torch_function__
   1533         # implementations can do equality/identity comparisons.
-> 1534         result = torch_func_method(public_api, types, args, kwargs)
   1536         if result is not NotImplemented:

ipdb> p result
*** NameError: name 'result' is not defined
ipdb> up
> /home/csaroff/.miniconda3/lib/python3.9/site-packages/torch/_tensor.py(100)__deepcopy__()
     98     def __deepcopy__(self, memo):
     99         if has_torch_function_unary(self):
--> 100             return handle_torch_function(Tensor.__deepcopy__, (self,), self, memo)
    101         if not self.is_leaf:
    102             raise RuntimeError(

ipdb> p self
TensorBase([[[-1., -1.],
             [-1.,  1.],
             [ 1., -1.],
             [ 1.,  1.]]], device='cuda:0')
ipdb> p self.name
ipdb> up
> /home/csaroff/.miniconda3/lib/python3.9/copy.py(153)deepcopy()
    151             copier = getattr(x, "__deepcopy__", None)
    152             if copier is not None:
--> 153                 y = copier(memo)
    154             else:
    155                 reductor = dispatch_table.get(cls)

ipdb> p memo
*** AttributeError: fs
ipdb> p copier
<bound method Tensor.__deepcopy__ of TensorBase([[[-1., -1.],
             [-1.,  1.],
             [ 1., -1.],
             [ 1.,  1.]]], device='cuda:0')>
ipdb> up
> /home/csaroff/.miniconda3/lib/python3.9/copy.py(230)_deepcopy_dict()
    228     memo[id(x)] = y
    229     for key, value in x.items():
--> 230         y[deepcopy(key, memo)] = deepcopy(value, memo)
    231     return y
    232 d[dict] = _deepcopy_dict

ipdb> p key
ipdb> p value
TensorBase([[[-1., -1.],
             [-1.,  1.],
             [ 1., -1.],
             [ 1.,  1.]]], device='cuda:0')

Thanks for the kind words.

If you pass restore_state=False to fastxtend's Learner.lr_find, it should behave exactly like the fastai version of lr_find and skip the deepcopy (and also skip restoring the random state).

I was able to recreate the error if I pass aug_transforms to a fastai DataLoader's batch_tfms. The PyTorch error message and related code suggests that the issue might be with fastai's TensorBase not correctly implementing required methods for deepcopy to work, but I need to investigate further.

Perfect! Now I can import * again. Thanks!

fastai/fastai#3882 should resolve this issue in the next release of fastai.