tianyic / only_train_once

OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM

Home Page:https://openreview.net/pdf?id=7ynoX1ojPMt

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to resume?

rotorliu opened this issue · comments

Hi, @tianyic
I run OTO on my project. However, there is an error which describe "ValueError: loaded state dict has a different number of parameter groups" while the train resumes from checkpoint.

Thanks for reaching out.

Resume can be conducted as the below.

model = torch.load(checkpoint_path)
dummy_input = sth
oto = OTO(model=model, dummy_input=dummy_input)
optimizer = oto.dhspg(
   # set lr, start_pruning etc  as normal
   fixed_zero_groups=True # if want to preserve the learnt group sparsity from previous training round. 
)

I make a pull request about resuming.
#11

Thanks for the PR!