Progressive Resize Runs Twice during Fine Tune Step

Question

Progressive Resize Runs Twice during Fine Tune Step

csaroff opened this issue a year ago · comments

Specifying ProgressiveResize() in the callbacks list and calling learn.fine_tune leads to ProgressiveResize being run for two separate training runs.

If I manually call all of the fine_tune steps, I can add the callback to only the unfrozen epochs, but fine_tune will run at the Resize size rather than the initial size.

My hypothesis is that we would see better training performance if the frozen epochs were run at the initial size. What's the simplest way to accomplish this with the callback?

Benjamin Warner · Answer 1 · Sat Feb 25 2023 02:57:57 GMT+0800 (China Standard Time)

Unfortunately, in fastai there isn't a built-in way for a callback to know what context it's being called, other than checking if another callback exists. ProgressiveResize can tell if you are predicting or using LRFinder via this code in ProgressiveResize.before_fit and prevent itself from running:

if hasattr(self.learn, 'lr_finder') or hasattr(self.learn, "gather_preds"):
    self.run = False
    return

but it cannot tell if fine_tune is calling it or fit_one_cycle (or other fit method). Neither can it tell if the frozen part of fine_tune is calling it or the unfrozen part.

The solution is to either manually run all of the fine_tune steps as you are doing except with two dataloaders: an initial size and full size dataloader for frozen and unfrozen, respectively, or create your own custom fine_tune method which takes the initial and full size dataloaders and a list of unfrozen callbacks.

Chaskin Saroff · Answer 2 · Sat Feb 25 2023 03:07:54 GMT+0800 (China Standard Time)

Makes sense. Have you experimented with this at all? Any recommendations on how best to mix progressive resizing with transfer learning?

For context, I'm using CutMixUpAugment and ProgressiveResize together, but it's weird that the accuracy is obliterated for the first couple of epochs

Benjamin Warner · Answer 3 · Sat Feb 25 2023 04:58:18 GMT+0800 (China Standard Time)

I have not. The best resources on progressive resizing are the fastai course and MosiacML's documentation, both which I link to in the fastxtend ProgressiveResize documentation.

My guess is CutMixUpAugment is the primary culprit. Usually, MixUp and CutMix achieve best results on longer training runs. Around 60-80 epochs on Imagenette sized dataset. I would try not applying CutMixUpAugment to the frozen training, as there you're adapting a random new head to the existing network. I'd also try only apply CutMixUpAugment if training longer, or use augment_finetune to delay when CutMixUp is applied.

Chaskin Saroff · Answer 4 · Sat Feb 25 2023 05:10:47 GMT+0800 (China Standard Time)

@warner-benjamin Based on some basic experimentation, it does seem like CutMixUpAugment is the culprit. I'll try incorporating your suggestions. I appreciate the resources and support!

Chaskin Saroff · Answer 5 · Sat Feb 25 2023 07:08:33 GMT+0800 (China Standard Time)

Idk if it's a bug in the callback or just some behavior that I don't fully understand, but running CutMixUpAugment with element=False dramatically improved the early epoch performance.

Benjamin Warner · Answer 6 · Sun Feb 26 2023 05:04:57 GMT+0800 (China Standard Time)

You can look at the documentation to see examples of element=False and element=True. True mixes MixUp, CutMix, and additional Augmentations within the same batch, while False selects one of the three per batch.