warner-benjamin / fastxtend

Train fastai models faster (and other useful tools)

Home Page:https://fastxtend.benjaminwarner.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Progressive Resize Runs Twice during Fine Tune Step

csaroff opened this issue · comments

image

Specifying ProgressiveResize() in the callbacks list and calling learn.fine_tune leads to ProgressiveResize being run for two separate training runs.

If I manually call all of the fine_tune steps, I can add the callback to only the unfrozen epochs, but fine_tune will run at the Resize size rather than the initial size.

My hypothesis is that we would see better training performance if the frozen epochs were run at the initial size. What's the simplest way to accomplish this with the callback?

Unfortunately, in fastai there isn't a built-in way for a callback to know what context it's being called, other than checking if another callback exists. ProgressiveResize can tell if you are predicting or using LRFinder via this code in ProgressiveResize.before_fit and prevent itself from running:

if hasattr(self.learn, 'lr_finder') or hasattr(self.learn, "gather_preds"):
    self.run = False
    return

but it cannot tell if fine_tune is calling it or fit_one_cycle (or other fit method). Neither can it tell if the frozen part of fine_tune is calling it or the unfrozen part.

The solution is to either manually run all of the fine_tune steps as you are doing except with two dataloaders: an initial size and full size dataloader for frozen and unfrozen, respectively, or create your own custom fine_tune method which takes the initial and full size dataloaders and a list of unfrozen callbacks.

Makes sense. Have you experimented with this at all? Any recommendations on how best to mix progressive resizing with transfer learning?

For context, I'm using CutMixUpAugment and ProgressiveResize together, but it's weird that the accuracy is obliterated for the first couple of epochs

I have not. The best resources on progressive resizing are the fastai course and MosiacML's documentation, both which I link to in the fastxtend ProgressiveResize documentation.

My guess is CutMixUpAugment is the primary culprit. Usually, MixUp and CutMix achieve best results on longer training runs. Around 60-80 epochs on Imagenette sized dataset. I would try not applying CutMixUpAugment to the frozen training, as there you're adapting a random new head to the existing network. I'd also try only apply CutMixUpAugment if training longer, or use augment_finetune to delay when CutMixUp is applied.

@warner-benjamin Based on some basic experimentation, it does seem like CutMixUpAugment is the culprit. I'll try incorporating your suggestions. I appreciate the resources and support!

Idk if it's a bug in the callback or just some behavior that I don't fully understand, but running CutMixUpAugment with element=False dramatically improved the early epoch performance.

You can look at the documentation to see examples of element=False and element=True. True mixes MixUp, CutMix, and additional Augmentations within the same batch, while False selects one of the three per batch.