facebookresearch / maskrcnn-benchmark

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to finetune from pretrained detectron models with different number of classes?

wangg12 opened this issue · comments

❓ Questions and Help

Is there a config option to load pretrained coco models for finetuning? The last layers where the number of classes may be different, so those weights should not be loaded.

Hi,

There currently isn't an off-the-shelf option in the config for that.
I see two easy options:
1 - from a python interpreter, load the pre-trained files that you want to use, and delete from the state_dict the keys corresponding to the last layer. The exact naming depends on the model architecture, but for boxes the name will end with a cls_score and bbox_pred, and for masks it will end with mask_fcn_logits.
2 - Clone the code-base and modify the names of the two variables that I pointed out to be something else, like cls_score_mine etc. This will work out of the box, and you can modify the NUM_CLASSES in the config without clashes.

I think we could provide a functionality to perform 1 for the users, given a cfg file and a path to a model weight. That could be a possible improvement on top of what we currently have.

What do you think?

@fmassa I think option 1 is more user-friendly. We can add a config option like PRETRAINED_DETECTRON_WEIGHTS and if it is given all the weights but those of the last layer would be loaded to initilize the model.

Yeah, option 1 is definitely simpler for the user (even if there are only a few lines to change here and there ;-) )

I'll prepare a PR adding support for this functionality, but I'm not 100% sure of what the API should look like, nor the best fix for it.

API

Should we have a function that acts on the weights and creates a new set of weights file? Or should we add an extra config argument, to make it a single step function? If we add an argument (which seems simpler for the user), would it be ambiguous?

Implementation

For the possible fixes, we could hard-code the possible names for the layers that shouldn't be loaded (as I mentioned before). But this is not super robust if the user changes their module names (which they can, if they want).

Another possible implementation is to not load the weights for the entire predictor. This is effectively the most robust way, as the predictor was designed to be only the "last layer".
This works nicely for boxes, but for masks we would lose one ConvTranspose2d layer initialization as well, which might be that bad in the end.

Thoughts?

I would prefer the former way. For possible module name changes by users, I think they should also be careful for weights loading, either by name remapping or random initialization.

@wangg12 could you expand on why you'd prefer the first approach? I was actually leaning more towards the second one, as it is more robust, and we have a clear contract with the user when we add an option to the config: "load every weight possible, except those in the predictor".

@fmassa There are two conditions where the first one may be more suitable.

  1. I just want to finetune the trained coco model on coco datasets.
  2. I want to use pretrained weights as much as I can, so the lost convtranspose2d weights may be unexpected.
    For other conditions, I think the second way is also OK.

So, I've discussed with a few people here and it seems that the best way of handling this would be to actually perform model surgery on the model files.

For example, the best results on CityScapes come from taking a COCO trained detector, then remove most of the classification and mask weights, but retaining those that correspond to common categories between both COCO and CityScapes.
Detectron does something as follows: https://github.com/facebookresearch/Detectron/blob/master/tools/convert_coco_model_to_cityscapes.py , so maybe the most generic thing to do is to provide a few helper functions for users to decide which layers to trim.

Yes, this way is more general.

"load the pre-trained files that you want to use, and delete from the state_dict"

Hi,

There currently isn't an off-the-shelf option in the config for that.
I see two easy options:
1 - from a python interpreter, load the pre-trained files that you want to use, and delete from the state_dict the keys corresponding to the last layer. The exact naming depends on the model architecture, but for boxes the name will end with a cls_score and bbox_pred, and for masks it will end with mask_fcn_logits.
2 - Clone the code-base and modify the names of the two variables that I pointed out to be something else, like cls_score_mine etc. This will work out of the box, and you can modify the NUM_CLASSES in the config without clashes.

I think we could provide a functionality to perform 1 for the users, given a cfg file and a path to a model weight. That could be a possible improvement on top of what we currently have.

What do you think?

Where are the pretrained files located? For example, I want to use pretrained net in imageset, wheere can we find those files and load them?

By default, they are stored in ~/.torch/models. The exact name of the file is printed during training, just before the printing of the loaded weights.

I added this function to train_net.py with an additional input arg. Note, the loaded models had an additional "module." prefix that had to be removed. After I removed this it worked great.

def _transfer_pretrained_weights(model, pretrained_model_pth):
    pretrained_weights = torch.load(pretrained_model_pth)['model']
    new_dict = {k.replace('module.',''):v for k, v in pretrained_weights.items()
                if 'cls_score' not in k and 'bbox_pred' not in k}
    this_state = model.state_dict()
    this_state.update(new_dict)
    model.load_state_dict(this_state)
    return model

I don't think this is the solution that @fmassa wants to implement but it'll work in a pinch for now.

Hello @steve-goley @fmassa , I've tried to load the pretrained model in this way:
w = torch.load("X-101-32x8d.pkl")

however, an error occured: UnicodeDecodeError: 'ascii' codec can't decode byte 0xad in position 2: ordinal not in range(128)
I am able to get over this errore by doing, with pickle:
with open("X-101-32x8d.pkl", "rb") as f: w = pickle.load(f, encoding='latin1')

But it seems to be no "model" key in the dict, just "blobs" dict and I can't find 'cls_score' and 'bbox_pred'.

Could you tell me how to overcome this issue?

Thanks

@antocapp the .pkl files are generally from the Detectron codebase, which is written in Caffe2.

What I'd recommend doing is the following:
1 - create a cfg object similar to what is present in the demo, for that particular model
2 - use load_c2_format function, which will give you a dict containing the model field. In there, you can perform the model surgery that you want, by removing fields etc
3 - save the object using pytorch torch.save, keeping the structure dict(model=state_dict).
4 - change MODEL.WEIGHT to point to this saved file.

Let me know if it doesn't work, I might have missed a step here.

Hi @fmassa, thanks for your support.
I wrote this:

from maskrcnn_benchmark.config import cfg
from maskrcnn_benchmark.utils.c2_model_loading import load_c2_format

cfg.merge_from_file("configs/caffe2/e2e_mask_rcnn_X_101_32x8d_FPN_1x_caffe2.yaml")
path = '/home/antonio/.torch/models/X-101-32x8d.pkl'
_d = load_c2_format(cfg, path)

keys = [k for k in _d['model'].keys()]
print(sorted(keys))

But i can't find 'cls_score' and 'bbox_pred' in the keys.

@antocapp you are loading the ImageNet-trained models (X-101-32x8d.pkl), not the detection models that have already been trained on COCO (which is probably what you want). The model file that you are looking for has a long name, should start with _ and parts of it are here.

Thanks @fmassa, so where I can find that model? When i performed inference with that model it works very well (I want just to fine tune it on a class on a specific dataset) but in .torch/models/ i see that only "X-101-32x8d.pkl" has been downloaded. Where i can find the detection model?

Thanks for your help i really appreciate that

EDIT: I launched again inference and it started downloading again the file 36761843/12_2017_baselines/e2e_mask_rcnn_X-101-32x8d-FPN_1x.yaml.06_35_59.RZotkLKI/output/train/coco_2014_train%3Acoco_2014_valminusminival/generalized_rcnn/model_final.pkl ; maybe I accidentally deleted the previous model from models/ folder. Thanks again!
I was able to prune 'cls_score' and 'bbox_pred' layers in the model, then saved it keeping the key 'model' in .pth with torch.save. Then i changed MODEL.WEIGHT to point to this file and ROI_BOX_HEAD.NUM_CLASSES to 2 (background and the only one class that i want to fine tune the model for). Is this correct?

A last question: how should I organize my dataset in order to fine tune the model?

Hi @antocapp,
Could you share your chunk of code that takes the pre trained mask rcnn model (beginning with _) and returns the modified one please (prunning the relevant fields)?
I am running into the same issues you mentionned in

Hello @steve-goley @fmassa , I've tried to load the pretrained model in this way:
w = torch.load("X-101-32x8d.pkl")

however, an error occured: UnicodeDecodeError: 'ascii' codec can't decode byte 0xad in position 2: ordinal not in range(128)
I am able to get over this errore by doing, with pickle:
with open("X-101-32x8d.pkl", "rb") as f: w = pickle.load(f, encoding='latin1')

But it seems to be no "model" key in the dict, just "blobs" dict and I can't find 'cls_score' and 'bbox_pred'.

Could you tell me how to overcome this issue?

Thanks

Thank you very much

@BelhalK the weights are inside blobs, but they have some pretty different names.

Got it. So the working function should be

def _transfer_pretrained_weights(model, pretrained_model_pth):
    pretrained_weights = torch.load(pretrained_model_pth)['**blobs**']
    new_dict = {k.replace('module.',''):v for k, v in pretrained_weights.items()
                if '**somethingelse**' not in k and '**somethingelse**' not in k}
    this_state = model.state_dict()
    this_state.update(new_dict)
    model.load_state_dict(this_state)
    return model

Where somethingelse should be different than cls_score and bbox_pred, right?

Almost, you'll probably need to plug it somewhere in utils/c2_loading

you may be right.
I initially wanted to insert it in tools/train_net.py
like

def _transfer_pretrained_weights(model, pretrained_model_pth):
    pretrained_weights = torch.load(pretrained_model_pth)['model']
    new_dict = {k.replace('module.',''):v for k, v in pretrained_weights.items()
                if 'cls_score' not in k and 'bbox_pred' not in k}
    this_state = model.state_dict()
    this_state.update(new_dict)
    model.load_state_dict(this_state)
    return model


def train(cfg, local_rank, distributed):
    old_model = build_detection_model(cfg)
    pretrained_model_pth = "/home/belhal/.torch/models/_detectron_35858933_12_2017_baselines_e2e_mask_rcnn_R-50-FPN_1x.yaml.01_48_14.DzEQe4wC_output_train_coco_2014_train%3Acoco_2014_valminusminival_generalized_rcnn_model_final.pkl"
    model = _transfer_pretrained_weights(old_model,pretrained_model_pth)
    device = torch.device(cfg.MODEL.DEVICE)
    model.to(device)
   ....

But it may be necessary in some other scripts

I have been using the different tips and tricks of this thread to modify a pre-trained model.
I am having an issue saving the modified dict into a new model.
I am using the following code

path='/Users/belhal/.torch/models/_detectron_35858933_12_2017_baselines_e2e_mask_rcnn_R-50-FPN_1x.yaml.01_48_14.DzEQe4wC_output_train_coco_2014_train%3Acoco_2014_valminusminival_generalized_rcnn_model_final.pkl'
from maskrcnn_benchmark.utils.c2_model_loading import load_c2_format

cfg.merge_from_file("../configs/e2e_mask_rcnn_X_101_32x8d_FPN_1x.yaml")
_d = load_c2_format(cfg, path)
newdict = _d

def removekey(d, listofkeys):
    r = dict(d)
    for key in listofkeys:
        del r[key]
    return r

newdict['model'] = removekey(_d['model'], ['cls_score.bias','cls_score.weight','bbox_pred.bias','bbox_pred.weight'])

How should I use torch.save(??, 'mymodel.pkl')to save a new model named mymodel.pkl with the resulting dict newdict?

Thanks a lot for your help!

You can just save it using torch.save(newdict, 'mymodel.pth'). Note the pth extension, and not pkl

Ok, and so this new pth model can be pointed out in my config file (MODEL.WEIGHT) to run training?

Thanks for this!

Yes, you can point to the pth file in MODEL.WEIGHT and that should be enough

Following my training on a pre-trained model:
I have now an issue on all my layers regarding sizing.
For instance see the following error message

size mismatch for backbone.body.layer3.22.conv1.weight: copying a param with shape torch.Size([64, 3, 7, 7]) from checkpoint, the shape in current model is torch.Size([1024, 1024, 1, 1]).
	size mismatch for backbone.body.layer3.22.bn1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for backbone.body.layer3.22.bn1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for backbone.body.layer4.0.conv1.weight: copying a param with shape torch.Size([512, 1024, 1, 1]) from checkpoint, the shape in current model is torch.Size([2048, 1024, 1, 1]).
	size mismatch for backbone.body.layer4.0.bn1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).
	size mismatch for backbone.body.layer4.0.bn1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([2048]).

I am running a simple commande
python tools/train_net.py --config-file "configs/myconfig.yaml"
with a model modified so that I could train on it (as mentionned above) and where myconfig.yaml file has adapted num_classes (=2) here and points to the model.pth.

Any ideas on how to adapt those sizes?

@BelhalK what have you modified in your model, only the last layer or other layers as well?

  • if it's only the last layer, the discussion above has the solution for you.
  • If it's all the layers, then it will be difficult to be able to reuse a pre-trained model.

I believe I modified only the last layer cls_score and bbox_pred but what you are saying is that I might have modified all of them.
I will check if it's the case. Indeed, it makes sense to only modify last layer of course.
Thanks

I've actually only changed
'bbox_pred.bias', 'bbox_pred.weight', 'cls_score.bias', 'cls_score.weight'
in the pre-trained model _detectron_35858933_12_2017_baselines_e2e_mask_rcnn_R-50-FPN_1x.yaml.01_48_14.DzEQe4wC_output_train_coco_2014_train%3Acoco_2014_valminusminival_generalized_rcnn_model_final.pkl``
and did not change any values of the form (for instance for layer 1.2)

'layer1.2.bn1.bias', 'layer1.2.bn1.weight', 'layer1.2.bn2.bias', 'layer1.2.bn2.weight', 'layer1.2.bn3.bias', 'layer1.2.bn3.weight', 'layer1.2.conv1.bias', 'layer1.2.conv1.weight', 'layer1.2.conv2.bias', 'layer1.2.conv2.weight', 'layer1.2.conv3.bias', 'layer1.2.conv3.weight'

I may want to try with another pre-trained model?

Can you check the top of the log and verify that the assignment from original names to saved names is correct? Another possibility would be that you are picking the wrong model config for the weights that you have?

@fmassa do you mind summarizing the steps we need to take to train on a dataset with two classes and background? I tried to follow this issue, but I'm still a bit lost. Any help is much appreciated!

@jbitton addressed your question in #273

Also, given that the current issues were not enough to give you full context on how to add new datasets, could you perhaps improve a bit the documentation in https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/maskrcnn_benchmark/data/README.md (maybe adding a link from the main README as well) with the points that were missing, and send a PR?

It would be a very welcome contribution!

@fmassa For sure! Do you mind if I get the PR out mid-next week? I'd like to first verify that I was able to go through the training/eval scripts successfully.

@jbitton sure, no worries! thanks a lot!

commented

What's the meaning of %3A in the saved path? It's the HTML code for a colon, but why do we want it in a path?

@mattans we don't necessarily want it in the path. But this might be specific to what Windows can have as characters in a path

To summarize, I've created a script tools/trim_detectron_model.py here.
You can decide which keys to be removed and which keys to be kept by modifying the script.

Then you can simply point the converted model path in the config file by changing MODEL.WEIGHT.

@wangg12 could you maybe add a section in the TROUBLESHOOTING or in the README pointing to your snippet and send a PR?

Thanks!

@fmassa I've created a PR #286

I had a question about using trim_detectron_model.py.
If I understand correctly, when we load model by using load_c2_format(cfg, path), this function can only work with .pkl file . However, what we save from training is .pth file, so I had a error when I wanted to use trim_detectron_model.py. for .pth file.

Is there any solution for this?
Thanks.

@xiaohai12 I believe you can just replace the call to load_c2_format with a simple torch.load, but I have not tested.

@xiaohai12 I believe you can just replace the call to load_c2_format with a simple torch.load, but I have not tested.

Thanks. I will try it.

@xiaohai12 I believe you can just replace the call to load_c2_format with a simple torch.load, but I have not tested.

It worked in my case when I modified load_c2_format to torch.load and modified the the parameters in removekey from cls_score to roi_heads.box.predictor.cls_socre(same for other parameters).