How to finetune from pretrained detectron models with different number of classes?

Question

How to finetune from pretrained detectron models with different number of classes?

wangg12 opened this issue 6 years ago · comments

❓ Questions and Help

Is there a config option to load pretrained coco models for finetuning? The last layers where the number of classes may be different, so those weights should not be loaded.

Francisco Massa · Answer 1 · Thu Oct 25 2018 22:53:22 GMT+0800 (China Standard Time)

Hi,

There currently isn't an off-the-shelf option in the config for that.
I see two easy options:
1 - from a python interpreter, load the pre-trained files that you want to use, and delete from the state_dict the keys corresponding to the last layer. The exact naming depends on the model architecture, but for boxes the name will end with a cls_score and bbox_pred, and for masks it will end with mask_fcn_logits.
2 - Clone the code-base and modify the names of the two variables that I pointed out to be something else, like cls_score_mine etc. This will work out of the box, and you can modify the NUM_CLASSES in the config without clashes.

I think we could provide a functionality to perform 1 for the users, given a cfg file and a path to a model weight. That could be a possible improvement on top of what we currently have.

What do you think?

Gu Wang · Answer 2 · Thu Oct 25 2018 23:24:02 GMT+0800 (China Standard Time)

@fmassa I think option 1 is more user-friendly. We can add a config option like PRETRAINED_DETECTRON_WEIGHTS and if it is given all the weights but those of the last layer would be loaded to initilize the model.

Francisco Massa · Answer 3 · Thu Oct 25 2018 23:59:54 GMT+0800 (China Standard Time)

Yeah, option 1 is definitely simpler for the user (even if there are only a few lines to change here and there ;-) )

I'll prepare a PR adding support for this functionality, but I'm not 100% sure of what the API should look like, nor the best fix for it.

API

Should we have a function that acts on the weights and creates a new set of weights file? Or should we add an extra config argument, to make it a single step function? If we add an argument (which seems simpler for the user), would it be ambiguous?

Implementation

For the possible fixes, we could hard-code the possible names for the layers that shouldn't be loaded (as I mentioned before). But this is not super robust if the user changes their module names (which they can, if they want).

Another possible implementation is to not load the weights for the entire predictor. This is effectively the most robust way, as the predictor was designed to be only the "last layer".
This works nicely for boxes, but for masks we would lose one ConvTranspose2d layer initialization as well, which might be that bad in the end.

Thoughts?

Gu Wang · Answer 4 · Fri Oct 26 2018 01:57:49 GMT+0800 (China Standard Time)

I would prefer the former way. For possible module name changes by users, I think they should also be careful for weights loading, either by name remapping or random initialization.

Francisco Massa · Answer 5 · Fri Oct 26 2018 02:11:42 GMT+0800 (China Standard Time)

@wangg12 could you expand on why you'd prefer the first approach? I was actually leaning more towards the second one, as it is more robust, and we have a clear contract with the user when we add an option to the config: "load every weight possible, except those in the predictor".

Gu Wang · Answer 6 · Fri Oct 26 2018 02:20:33 GMT+0800 (China Standard Time)

@fmassa There are two conditions where the first one may be more suitable.

I just want to finetune the trained coco model on coco datasets.
I want to use pretrained weights as much as I can, so the lost convtranspose2d weights may be unexpected.
For other conditions, I think the second way is also OK.

Francisco Massa · Answer 7 · Fri Oct 26 2018 03:02:30 GMT+0800 (China Standard Time)

So, I've discussed with a few people here and it seems that the best way of handling this would be to actually perform model surgery on the model files.

For example, the best results on CityScapes come from taking a COCO trained detector, then remove most of the classification and mask weights, but retaining those that correspond to common categories between both COCO and CityScapes.
Detectron does something as follows: https://github.com/facebookresearch/Detectron/blob/master/tools/convert_coco_model_to_cityscapes.py , so maybe the most generic thing to do is to provide a few helper functions for users to decide which layers to trim.

Gu Wang · Answer 8 · Fri Oct 26 2018 03:19:45 GMT+0800 (China Standard Time)

Yes, this way is more general.

Xuanyu Zhou · Answer 9 · Wed Oct 31 2018 05:49:13 GMT+0800 (China Standard Time)

"load the pre-trained files that you want to use, and delete from the state_dict"

Hi,

There currently isn't an off-the-shelf option in the config for that.
I see two easy options:
1 - from a python interpreter, load the pre-trained files that you want to use, and delete from the state_dict the keys corresponding to the last layer. The exact naming depends on the model architecture, but for boxes the name will end with a cls_score and bbox_pred, and for masks it will end with mask_fcn_logits.
2 - Clone the code-base and modify the names of the two variables that I pointed out to be something else, like cls_score_mine etc. This will work out of the box, and you can modify the NUM_CLASSES in the config without clashes.

I think we could provide a functionality to perform 1 for the users, given a cfg file and a path to a model weight. That could be a possible improvement on top of what we currently have.

What do you think?

Where are the pretrained files located? For example, I want to use pretrained net in imageset, wheere can we find those files and load them?

Francisco Massa · Answer 10 · Wed Oct 31 2018 17:09:10 GMT+0800 (China Standard Time)

By default, they are stored in ~/.torch/models. The exact name of the file is printed during training, just before the printing of the loaded weights.

steve-goley · Answer 11 · Wed Oct 31 2018 20:17:10 GMT+0800 (China Standard Time)

I added this function to train_net.py with an additional input arg. Note, the loaded models had an additional "module." prefix that had to be removed. After I removed this it worked great.

def _transfer_pretrained_weights(model, pretrained_model_pth):
    pretrained_weights = torch.load(pretrained_model_pth)['model']
    new_dict = {k.replace('module.',''):v for k, v in pretrained_weights.items()
                if 'cls_score' not in k and 'bbox_pred' not in k}
    this_state = model.state_dict()
    this_state.update(new_dict)
    model.load_state_dict(this_state)
    return model

I don't think this is the solution that @fmassa wants to implement but it'll work in a pinch for now.

cppntn · Answer 12 · Mon Nov 05 2018 22:26:58 GMT+0800 (China Standard Time)

Hello @steve-goley @fmassa , I've tried to load the pretrained model in this way:
w = torch.load("X-101-32x8d.pkl")

however, an error occured: UnicodeDecodeError: 'ascii' codec can't decode byte 0xad in position 2: ordinal not in range(128)
I am able to get over this errore by doing, with pickle:
with open("X-101-32x8d.pkl", "rb") as f: w = pickle.load(f, encoding='latin1')

But it seems to be no "model" key in the dict, just "blobs" dict and I can't find 'cls_score' and 'bbox_pred'.

Could you tell me how to overcome this issue?

Thanks

Francisco Massa · Answer 13 · Mon Nov 05 2018 22:45:53 GMT+0800 (China Standard Time)

@antocapp the .pkl files are generally from the Detectron codebase, which is written in Caffe2.

What I'd recommend doing is the following:
1 - create a cfg object similar to what is present in the demo, for that particular model
2 - use load_c2_format function, which will give you a dict containing the model field. In there, you can perform the model surgery that you want, by removing fields etc
3 - save the object using pytorch torch.save, keeping the structure dict(model=state_dict).
4 - change MODEL.WEIGHT to point to this saved file.

Let me know if it doesn't work, I might have missed a step here.

cppntn · Answer 14 · Mon Nov 05 2018 23:06:54 GMT+0800 (China Standard Time)

Hi @fmassa, thanks for your support.
I wrote this:

from maskrcnn_benchmark.config import cfg
from maskrcnn_benchmark.utils.c2_model_loading import load_c2_format

cfg.merge_from_file("configs/caffe2/e2e_mask_rcnn_X_101_32x8d_FPN_1x_caffe2.yaml")
path = '/home/antonio/.torch/models/X-101-32x8d.pkl'
_d = load_c2_format(cfg, path)

keys = [k for k in _d['model'].keys()]
print(sorted(keys))

But i can't find 'cls_score' and 'bbox_pred' in the keys.

Francisco Massa · Answer 15 · Mon Nov 05 2018 23:50:06 GMT+0800 (China Standard Time)

@antocapp you are loading the ImageNet-trained models (X-101-32x8d.pkl), not the detection models that have already been trained on COCO (which is probably what you want). The model file that you are looking for has a long name, should start with _ and parts of it are here.

cppntn · Answer 16 · Tue Nov 06 2018 16:35:23 GMT+0800 (China Standard Time)

Thanks @fmassa, so where I can find that model? When i performed inference with that model it works very well (I want just to fine tune it on a class on a specific dataset) but in .torch/models/ i see that only "X-101-32x8d.pkl" has been downloaded. Where i can find the detection model?

Thanks for your help i really appreciate that

EDIT: I launched again inference and it started downloading again the file 36761843/12_2017_baselines/e2e_mask_rcnn_X-101-32x8d-FPN_1x.yaml.06_35_59.RZotkLKI/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl ; maybe I accidentally deleted the previous model from models/ folder. Thanks again!
I was able to prune 'cls_score' and 'bbox_pred' layers in the model, then saved it keeping the key 'model' in .pth with torch.save. Then i changed MODEL.WEIGHT to point to this file and ROI_BOX_HEAD.NUM_CLASSES to 2 (background and the only one class that i want to fine tune the model for). Is this correct?

A last question: how should I organize my dataset in order to fine tune the model?

Belhal KARIMI · Answer 17 · Sat Nov 24 2018 02:01:27 GMT+0800 (China Standard Time)

Hi @antocapp,
Could you share your chunk of code that takes the pre trained mask rcnn model (beginning with _) and returns the modified one please (prunning the relevant fields)?
I am running into the same issues you mentionned in

Hello @steve-goley @fmassa , I've tried to load the pretrained model in this way:
w = torch.load("X-101-32x8d.pkl")

however, an error occured: UnicodeDecodeError: 'ascii' codec can't decode byte 0xad in position 2: ordinal not in range(128)
I am able to get over this errore by doing, with pickle:
with open("X-101-32x8d.pkl", "rb") as f: w = pickle.load(f, encoding='latin1')

But it seems to be no "model" key in the dict, just "blobs" dict and I can't find 'cls_score' and 'bbox_pred'.

Could you tell me how to overcome this issue?

Thanks

Thank you very much

Francisco Massa · Answer 18 · Sat Nov 24 2018 02:03:40 GMT+0800 (China Standard Time)

@BelhalK the weights are inside blobs, but they have some pretty different names.

Belhal KARIMI · Answer 19 · Sat Nov 24 2018 04:45:49 GMT+0800 (China Standard Time)

Got it. So the working function should be

def _transfer_pretrained_weights(model, pretrained_model_pth):
    pretrained_weights = torch.load(pretrained_model_pth)['**blobs**']
    new_dict = {k.replace('module.',''):v for k, v in pretrained_weights.items()
                if '**somethingelse**' not in k and '**somethingelse**' not in k}
    this_state = model.state_dict()
    this_state.update(new_dict)
    model.load_state_dict(this_state)
    return model

Where somethingelse should be different than cls_score and bbox_pred, right?

Francisco Massa · Answer 20 · Sat Nov 24 2018 04:49:14 GMT+0800 (China Standard Time)

Almost, you'll probably need to plug it somewhere in utils/c2_loading

Belhal KARIMI · Answer 21 · Sat Nov 24 2018 04:57:13 GMT+0800 (China Standard Time)

you may be right.
I initially wanted to insert it in tools/train_net.py
like

def _transfer_pretrained_weights(model, pretrained_model_pth):
    pretrained_weights = torch.load(pretrained_model_pth)['model']
    new_dict = {k.replace('module.',''):v for k, v in pretrained_weights.items()
                if 'cls_score' not in k and 'bbox_pred' not in k}
    this_state = model.state_dict()
    this_state.update(new_dict)
    model.load_state_dict(this_state)
    return model


def train(cfg, local_rank, distributed):
    old_model = build_detection_model(cfg)
    pretrained_model_pth = "/home/belhal/.torch/models/_detectron_35858933_12_2017_baselines_e2e_mask_rcnn_R-50-FPN_1x.yaml.01_48_14.DzEQe4wC_output_train_coco_2014_train%3Acoco_2014_valminusminival_generalized_rcnn_model_final.pkl"
    model = _transfer_pretrained_weights(old_model,pretrained_model_pth)
    device = torch.device(cfg.MODEL.DEVICE)
    model.to(device)
   ....

But it may be necessary in some other scripts

Belhal KARIMI · Answer 22 · Sun Nov 25 2018 00:22:26 GMT+0800 (China Standard Time)

I have been using the different tips and tricks of this thread to modify a pre-trained model.
I am having an issue saving the modified dict into a new model.
I am using the following code

path='/Users/belhal/.torch/models/_detectron_35858933_12_2017_baselines_e2e_mask_rcnn_R-50-FPN_1x.yaml.01_48_14.DzEQe4wC_output_train_coco_2014_train%3Acoco_2014_valminusminival_generalized_rcnn_model_final.pkl'
from maskrcnn_benchmark.utils.c2_model_loading import load_c2_format

cfg.merge_from_file("../configs/e2e_mask_rcnn_X_101_32x8d_FPN_1x.yaml")
_d = load_c2_format(cfg, path)
newdict = _d

def removekey(d, listofkeys):
    r = dict(d)
    for key in listofkeys:
        del r[key]
    return r

newdict['model'] = removekey(_d['model'], ['cls_score.bias','cls_score.weight','bbox_pred.bias','bbox_pred.weight'])

How should I use torch.save(??, 'mymodel.pkl')to save a new model named mymodel.pkl with the resulting dict newdict?

Thanks a lot for your help!

Francisco Massa · Answer 23 · Tue Nov 27 2018 17:51:04 GMT+0800 (China Standard Time)

You can just save it using torch.save(newdict, 'mymodel.pth'). Note the pth extension, and not pkl

Belhal KARIMI · Answer 24 · Tue Nov 27 2018 17:52:59 GMT+0800 (China Standard Time)

Ok, and so this new pth model can be pointed out in my config file (MODEL.WEIGHT) to run training?

Thanks for this!

Francisco Massa · Answer 25 · Tue Nov 27 2018 17:54:59 GMT+0800 (China Standard Time)

Yes, you can point to the pth file in MODEL.WEIGHT and that should be enough

Belhal KARIMI · Answer 26 · Thu Dec 06 2018 22:51:07 GMT+0800 (China Standard Time)

Following my training on a pre-trained model:
I have now an issue on all my layers regarding sizing.
For instance see the following error message

size mismatch for backbone.body.layer3.22.conv1.weight: copying a param with shape torch.Size([64, 3, 7, 7]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1, 1]).
	size mismatch for backbone.body.layer3.22.bn1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for backbone.body.layer3.22.bn1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for backbone.body.layer4.0.conv1.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([2048, 1024, 1, 1]).
	size mismatch for backbone.body.layer4.0.bn1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
	size mismatch for backbone.body.layer4.0.bn1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).

I am running a simple commande
python tools/train_net.py --config-file "configs/myconfig.yaml"
with a model modified so that I could train on it (as mentionned above) and where myconfig.yaml file has adapted num_classes (=2) here and points to the model.pth.

Any ideas on how to adapt those sizes?

Francisco Massa · Answer 27 · Thu Dec 06 2018 23:03:16 GMT+0800 (China Standard Time)

@BelhalK what have you modified in your model, only the last layer or other layers as well?

if it's only the last layer, the discussion above has the solution for you.
If it's all the layers, then it will be difficult to be able to reuse a pre-trained model.

Belhal KARIMI · Answer 28 · Thu Dec 06 2018 23:25:01 GMT+0800 (China Standard Time)

I believe I modified only the last layer cls_score and bbox_pred but what you are saying is that I might have modified all of them.
I will check if it's the case. Indeed, it makes sense to only modify last layer of course.
Thanks

Belhal KARIMI · Answer 29 · Thu Dec 06 2018 23:29:25 GMT+0800 (China Standard Time)

I've actually only changed
'bbox_pred.bias', 'bbox_pred.weight', 'cls_score.bias', 'cls_score.weight'
in the pre-trained model _detectron_35858933_12_2017_baselines_e2e_mask_rcnn_R-50-FPN_1x.yaml.01_48_14.DzEQe4wC_output_train_coco_2014_train%3Acoco_2014_valminusminival_generalized_rcnn_model_final.pkl``
and did not change any values of the form (for instance for layer 1.2)

'layer1.2.bn1.bias', 'layer1.2.bn1.weight', 'layer1.2.bn2.bias', 'layer1.2.bn2.weight', 'layer1.2.bn3.bias', 'layer1.2.bn3.weight', 'layer1.2.conv1.bias', 'layer1.2.conv1.weight', 'layer1.2.conv2.bias', 'layer1.2.conv2.weight', 'layer1.2.conv3.bias', 'layer1.2.conv3.weight'

I may want to try with another pre-trained model?

Francisco Massa · Answer 30 · Thu Dec 06 2018 23:49:12 GMT+0800 (China Standard Time)

Can you check the top of the log and verify that the assignment from original names to saved names is correct? Another possibility would be that you are picking the wrong model config for the weights that you have?

Joanna Bitton · Answer 31 · Fri Dec 14 2018 09:31:35 GMT+0800 (China Standard Time)

@fmassa do you mind summarizing the steps we need to take to train on a dataset with two classes and background? I tried to follow this issue, but I'm still a bit lost. Any help is much appreciated!

Francisco Massa · Answer 32 · Fri Dec 14 2018 18:26:20 GMT+0800 (China Standard Time)

@jbitton addressed your question in #273

Also, given that the current issues were not enough to give you full context on how to add new datasets, could you perhaps improve a bit the documentation in https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/maskrcnn_benchmark/data/README.md (maybe adding a link from the main README as well) with the points that were missing, and send a PR?

It would be a very welcome contribution!

Joanna Bitton · Answer 33 · Fri Dec 14 2018 18:33:43 GMT+0800 (China Standard Time)

@fmassa For sure! Do you mind if I get the PR out mid-next week? I'd like to first verify that I was able to go through the training/eval scripts successfully.

Francisco Massa · Answer 34 · Fri Dec 14 2018 18:34:30 GMT+0800 (China Standard Time)

@jbitton sure, no worries! thanks a lot!

M · Answer 35 · Fri Dec 14 2018 19:23:39 GMT+0800 (China Standard Time)

What's the meaning of %3A in the saved path? It's the HTML code for a colon, but why do we want it in a path?

Francisco Massa · Answer 36 · Fri Dec 14 2018 20:23:17 GMT+0800 (China Standard Time)

@mattans we don't necessarily want it in the path. But this might be specific to what Windows can have as characters in a path

Gu Wang · Answer 37 · Tue Dec 18 2018 11:13:02 GMT+0800 (China Standard Time)

To summarize, I've created a script tools/trim_detectron_model.py here.
You can decide which keys to be removed and which keys to be kept by modifying the script.

Then you can simply point the converted model path in the config file by changing MODEL.WEIGHT.

Francisco Massa · Answer 38 · Tue Dec 18 2018 18:27:51 GMT+0800 (China Standard Time)

@wangg12 could you maybe add a section in the TROUBLESHOOTING or in the README pointing to your snippet and send a PR?

Thanks!

Gu Wang · Answer 39 · Tue Dec 18 2018 19:08:49 GMT+0800 (China Standard Time)

@fmassa I've created a PR #286

Eric_lei · Answer 40 · Thu May 30 2019 02:07:05 GMT+0800 (China Standard Time)

I had a question about using trim_detectron_model.py.
If I understand correctly, when we load model by using load_c2_format(cfg, path), this function can only work with .pkl file . However, what we save from training is .pth file, so I had a error when I wanted to use trim_detectron_model.py. for .pth file.

Is there any solution for this?
Thanks.

Christopher Bate · Answer 41 · Thu May 30 2019 06:58:57 GMT+0800 (China Standard Time)

@xiaohai12 I believe you can just replace the call to load_c2_format with a simple torch.load, but I have not tested.

Eric_lei · Answer 42 · Thu May 30 2019 15:36:39 GMT+0800 (China Standard Time)

@xiaohai12 I believe you can just replace the call to load_c2_format with a simple torch.load, but I have not tested.

Thanks. I will try it.

Eric_lei · Answer 43 · Tue Jun 04 2019 16:31:39 GMT+0800 (China Standard Time)

@xiaohai12 I believe you can just replace the call to load_c2_format with a simple torch.load, but I have not tested.

It worked in my case when I modified load_c2_format to torch.load and modified the the parameters in removekey from cls_score to roi_heads.box.predictor.cls_socre(same for other parameters).