mit-han-lab / offsite-tuning

Offsite-Tuning: Transfer Learning without Full Model

Home Page:https://arxiv.org/abs/2302.04870

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NotImplementedError

vis-face opened this issue · comments

offsite_tuning/run_image_classification.py

def to_teacher(model, args):
l = args.student_l_pad
print(type(model.model))
if isinstance(model, OPTForCausalLM):
r = len(model.model.decoder.layers) - args.student_r_pad
model.model.decoder.layers = model.model.decoder.layers[
:l] + model.teacher + model.model.decoder.layers[r:]
elif isinstance(model, GPT2LMHeadModel):
r = len(model.transformer.h) - args.student_r_pad
model.transformer.h = model.transformer.h[:l] +
model.teacher + model.transformer.h[r:]
elif isinstance(model, BloomForCausalLM):
r = len(model.transformer.h) - args.student_r_pad
model.transformer.h = model.transformer.h[:l] +
model.teacher + model.transformer.h[r:]
elif isinstance(model, ViTForImageClassification):
r = len(model.vit.encoder.layer) - args.student_r_pad
model.vit.encoder.layer = model.vit.encoder.layer[:l] +
model.teacher + model.vit.encoder.layer[r:]
elif isinstance(model, CLIPViTForImageClassification):
r = len(model.vit.encoder.layers) - args.student_r_pad
model.vit.encoder.layers = model.vit.encoder.layers[:l] +
model.teacher + model.vit.encoder.layers[r:]
elif isinstance(model, EVAViTForImageClassification):
r = len(model.blocks) - args.student_r_pad
model.blocks = model.blocks[:l] +
model.teacher + model.blocks[r:]
else:
raise NotImplementedError

<class 'torch.nn.parallel.distributed.DistributedDataParallel'>
Traceback (most recent call last):
File "offsite_tuning/run_image_classification.py", line 564, in
main()
File "offsite_tuning/run_image_classification.py", line 413, in main
model = to_teacher(model, args)
File "/root/paddlejob/workspace/env_run/offsite-tuning-main/offsite_tuning/utils.py", line 714, in to_teacher
raise NotImplementedError
NotImplementedError

I meet the same problem...

08/14/2023 19:46:09 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 64
08/14/2023 19:46:09 - INFO - main - Gradient Accumulation steps = 1
08/14/2023 19:46:09 - INFO - main - Total optimization steps = 32
Traceback (most recent call last):
File "/home/bufang/offsite-tuning/offsite_tuning/run_image_classification.py", line 564, in
main()
File "/home/bufang/offsite-tuning/offsite_tuning/run_image_classification.py", line 411, in main
model = to_student(model, args)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/bufang/offsite-tuning/offsite_tuning/utils.py", line 743, in to_student
raise NotImplementedError
NotImplementedError