How to train Faster R-CNN on my own dataset ?

Question

How to train Faster R-CNN on my own dataset ?

JohnnyY8 opened this issue 8 years ago · comments

Hi everyone:
I want to train Faster R-CNN on my own dataset. Because Faster R-CNN does not use selective search method, I comment the code about selective. However, there are still some errors about roidb, and so on.
Can anybody help me ? I am not quite sure what should I do for training Faster R-CNN. It is a little complicated for me.
Thanks so much!

Steven · Answer 1 · Thu Jul 07 2016 19:13:02 GMT+0800 (China Standard Time)

@JohnnyY8

Hi, I did the same thing. At first you should work through the code and check out, where which functions are called and you should try the demo.py. Afterwards in the readme is a section called "Beyond the demo" which explains the basic proceeding.

Additionally, you should search for issues in this repo. There are actually quite a lot similar issues that ask the same question.

Furthermore, here is a really good documentation of the "how to train on own dataset". This helped me a lot.

Finally, I'll sum up the main steps for you:

Copy the structure of the pascal voc dataset into the FRCN_ROOT/data/, create a symbolic link and place your data in a similar manner as the pascal voc data set. That's actually the best way to prevent you from huge code changes in the following steps.
Create a FRCN_ROOT/lib/datasets/.py and a _eval.py corresponding the pascal_voc.py and voc_eval.py
Update the FRCN_ROOT/lib/datasets/factory.py by adding a new entry for your own dataset.
Adapt the models under FRCN_ROOT/models/ by copying and changing an existing one like pascal_voc. Note, that you have to take care of the path within the solver and the amount of classes in the train and test prototxt. I can recommend you to start with the ZF model and the end2end algorithm. The alt_opt is more complex and better if you have more experience later.
Create a config file under FRCN_ROOT/experiments/cfgs also by copying and updating an existing one.
Create or update an experiment script under FRCN_ROOT/experiments/scripts by modifying it to your dataset
Start training and testing by running the experiment script created in the previous step.

There are just the main steps I figured out during my work with the framework. It will take some time to get into it and several problems will occur by using the framework with your own dataset. The most problems are already addressed within other issues in this repo.

It might also be very helpful to use a python IDE that supports debugging.

Hope that helps. =)

Lingwei Xie · Answer 2 · Thu Jul 07 2016 20:49:56 GMT+0800 (China Standard Time)

Hi @ednarb29 , thanks for you answer sincerely, I will try it now. Hope I can do it.
In addition, VID dataset has a lot of frames, more than one million. I am not quite sure if the code will create cache file for VID dataset ? Every time, it will takes me much time to load frames ?
Thank you again!

Steven · Answer 3 · Thu Jul 07 2016 23:16:08 GMT+0800 (China Standard Time)

You can easily check that out, the file should be under FRCN_ROOT/data/cache/

Of course if this file is huge it needs some time even to load the cache file I guess. Maybe you should debug that. Naively you can delete the cache file and start training again. So you can compare the time it needs to create the dataset / load the cache file.

Lingwei Xie · Answer 4 · Thu Jul 07 2016 23:25:16 GMT+0800 (China Standard Time)

Hi @ednarb29 , I have tried method you said. There are some errors about selective_search I can't handle like following.

In my opinion, Faster R-CNN doesn't use selective search, so I prefer to comment some codes about selective search such as "self.selective_search_roidb". But maybe it is not a right way to solve. Could you please give me some suggestions?

tiepnh · Answer 5 · Fri Jul 08 2016 09:43:25 GMT+0800 (China Standard Time)

@JohnnyY8 : Can you paste here your configuration information which are printed on terminal. I guess that your configuration file still choose the proposal method is selective search

Lingwei Xie · Answer 6 · Fri Jul 08 2016 09:48:33 GMT+0800 (China Standard Time)

@tiepnh Hi! You are right. According to tutorial "https://github.com/deboc/py-faster-rcnn/tree/master/help", I use command ($ echo 'MODELS_DIR: "$PY_FASTER_RCNN/models"' >> config.yml) to generate config.yml. But if I change it to "experiments/cfgs/faster_rcnn_end2end.yml", it looks ok.

Lingwei Xie · Answer 7 · Fri Jul 08 2016 16:40:22 GMT+0800 (China Standard Time)

@tiepnh @ednarb29 I can starting training, it looks close to right way. I will check it on validation set after finishing training. Thanks for you guys' help!!!
Another question is in factory.py like following. What does the split mean? If there are ["train", "val", "test"], what do they use for ? train for training, val and test for what ?

tiepnh · Answer 8 · Fri Jul 08 2016 18:44:51 GMT+0800 (China Standard Time)

@JohnnyY8 : This array will point to your image set files. As your pasted code, there are no image set file for testing or they use same image set for both training and testing.
Example: for the pascal_voc
The script file will call the this command for training
time ./tools/train_net.py --gpu ${GPU_ID} \ --solver models/${PT_DIR}/${NET}/faster_rcnn_end2end/solver.prototxt \ --weights data/prdcv_models/${NET}.v2.caffemodel \ --imdb ${TRAIN_IMDB} \ --iters ${ITERS} \ --cfg experiments/cfgs/faster_rcnn_end2end.yml \ ${EXTRA_ARGS}
The TRAIN_IMDB is "voc_2007_trainval" => they will load all image in image set files ".....trainval.txt"
For the testing, they will use TEST_IMDB="voc_2007_test" => load image in image set file "....test.txt" to test the trained network

Lingwei Xie · Answer 9 · Fri Jul 08 2016 19:26:18 GMT+0800 (China Standard Time)

@tiepnh Cool! Your answer is very useful and clear! Thanks so much!
That means the ground truth of PASCAL VOC 2007 test set is under "Annotaions" folder, right? Otherwise, it can't get mAP after finish training.
But I do not have the ground truth of VID test set and use TEST_IMDB="VID_val", does that mean it will test on validation set?

Lingwei Xie · Answer 10 · Sat Jul 09 2016 18:39:57 GMT+0800 (China Standard Time)

@tiepnh Hi!
I use command to start training:

sudo ./tools/train_net.py --gpu 0 --iters 100000 --weights data/imagenet_models/ZF.v2.caffemodel --imdb VID_train --cfg ./experiments/cfgs/faster_rcnn_end2end.yml --solver models/pascal_voc/ZF/faster_rcnn_end2end/solver.prototxt

but still got following errors:
Traceback (most recent call last):
File "./tools/train_net.py", line 112, in
max_iters=args.max_iters)
File "/usr/local/caffes/xlw/faster-rcnn-third/tools/../lib/fast_rcnn/train.py", line 155, in train_net
roidb = filter_roidb(roidb)
File "/usr/local/caffes/xlw/faster-rcnn-third/tools/../lib/fast_rcnn/train.py", line 145, in filter_roidb
filtered_roidb = [entry for entry in roidb if is_valid(entry)]
File "/usr/local/caffes/xlw/faster-rcnn-third/tools/../lib/fast_rcnn/train.py", line 134, in is_valid
overlaps = entry['max_overlaps']
KeyError: 'max_overlaps'

Is there something wrong ?

tiepnh · Answer 11 · Mon Jul 11 2016 11:11:25 GMT+0800 (China Standard Time)

@JohnnyY8 :

That means the ground truth of PASCAL VOC 2007 test set is under "Annotaions" folder, right?
For both, test set/ train set, the ground truth of Pascal_voc is under Annotations.

For the TEST_IMDB, it just point to set of image use to test. So, if your use same image set for TRAIN_IMDB and TEST_IMDB, it will train and test the network in same dataset.
Secondly, you have to write your test function. See this tuto https://github.com/deboc/py-faster-rcnn/tree/master/lib/datasets

The error "max_overlaps" it seem that your data have no foreground ROI or background ROI. So, please check again your py file, which use to read your dataset

Lingwei Xie · Answer 12 · Mon Jul 11 2016 17:58:36 GMT+0800 (China Standard Time)

@tiepnh Thank you so much! You are so nice.
I have found some bugs and restart training.
Let's waiting for the results.
Really, thanks for your help!

Lingwei Xie · Answer 13 · Tue Jul 12 2016 10:23:50 GMT+0800 (China Standard Time)

@tiepnh @ednarb29 Hi!
I restarted training, but some strange problem occurred. I printed some path in train.txt, like this:

When I see the printed information in terminal, I notice that the data has been loaded for many times! My teammate and me are pretty sure it has finished the whole training set for at least once. But this information shows it start from 0000 again.

Could you please help me? We have loaded training data for more than 20 hours.
Thank you so much!

Steven · Answer 14 · Tue Jul 12 2016 16:00:56 GMT+0800 (China Standard Time)

At first I would suggest you to start training and testing with a very little data set (100 images and 1k iterations), that you can debug the training and testing quite fast.

Does the problem occur during creation of the data set or during training?

Lingwei Xie · Answer 15 · Tue Jul 12 2016 16:08:43 GMT+0800 (China Standard Time)

@ednarb29 I am not quite sure, several times before, I can load data about 2~4 hours (also load repeatly). But this time is stranger. We do not change any codes, just restart the training. The time for loading data is very long!

Lingwei Xie · Answer 16 · Wed Jul 13 2016 08:43:30 GMT+0800 (China Standard Time)

@ednarb29 Do you just load data for once after start traing ?

Steven · Answer 17 · Wed Jul 13 2016 14:38:59 GMT+0800 (China Standard Time)

I am not sure about that because this kind of problem did not occur for me... If I had problems with loading the data set I just removed the cache file and that solved the problem in most cases because changes on the original data set are not updated in the cache file. Sorry dude.

deboc · Answer 18 · Wed Jul 13 2016 15:16:18 GMT+0800 (China Standard Time)

Hi @JohnnyY8,
I completely agree with the idea of ednarb29, you should test with a (very) small dataset at first.
Moreover, I'm pretty sure that it's a bad idea to print anything for each data input. That may be the cause of the enormous additional loading time you got.

Lingwei Xie · Answer 19 · Wed Jul 13 2016 15:28:24 GMT+0800 (China Standard Time)

@ednarb29 Not to be sorry, I should thank you!
I will remove the cache file and restart training! Really thanks for your help!

Lingwei Xie · Answer 20 · Wed Jul 13 2016 15:36:03 GMT+0800 (China Standard Time)

@deboc That is right. I will try it. Thank you!
If I print anything, that will cause huge loading time ?

deboc · Answer 21 · Wed Jul 13 2016 15:42:05 GMT+0800 (China Standard Time)

I just bet it's not negligible.
You were saying the loading time had raised from 4h to 20h right ? What did you change beside adding this print ?

Lingwei Xie · Answer 22 · Wed Jul 13 2016 15:47:02 GMT+0800 (China Standard Time)

@deboc Oh, I see. Only add print codes. So that is stranger for us.

Steven · Answer 23 · Wed Jul 13 2016 20:48:31 GMT+0800 (China Standard Time)

Did removing the print command speed up the process?

And did removing the cache file and build the database again solve your problem with the KeyError: 'max_overlaps'?

Lingwei Xie · Answer 24 · Thu Jul 14 2016 08:31:59 GMT+0800 (China Standard Time)

@ednarb29 I don't try to remove the print command. Because I really want to know the process, I guss this time consuming is negligible.
And removing the cache file works, my training restarts into iteration. Thanks a lot!

Steven · Answer 25 · Thu Jul 14 2016 14:59:37 GMT+0800 (China Standard Time)

Cool, so if it works fine you can close the issue? =)

Lingwei Xie · Answer 26 · Thu Jul 14 2016 16:30:56 GMT+0800 (China Standard Time)

@ednarb29 Sure, thank you very much!

Georgi Angelov · Answer 27 · Mon Jul 25 2016 12:23:46 GMT+0800 (China Standard Time)

@deboc , I have a quick question. I get the following error when I executed the following command:

Command:
./tools/train_faster_rcnn_alt_opt.py --gpu 0 --net_name INRIA_Person --weights data/faster_rcnn_models/VGG16_faster_rcnn_final.caffemodel --imdb inria_train --cfg config.yml

Error:

.....
I0725 04:10:00.437233  3494 net.cpp:816] Ignoring source layer conv4_3
I0725 04:10:00.437252  3494 net.cpp:816] Ignoring source layer relu4_3
I0725 04:10:00.437268  3494 net.cpp:816] Ignoring source layer pool4
I0725 04:10:00.437296  3494 net.cpp:816] Ignoring source layer conv5_1
I0725 04:10:00.437314  3494 net.cpp:816] Ignoring source layer relu5_1
I0725 04:10:00.437331  3494 net.cpp:816] Ignoring source layer conv5_2
I0725 04:10:00.437350  3494 net.cpp:816] Ignoring source layer relu5_2
I0725 04:10:00.437366  3494 net.cpp:816] Ignoring source layer conv5_3
I0725 04:10:00.437384  3494 net.cpp:816] Ignoring source layer relu5_3
I0725 04:10:00.437397  3494 net.cpp:816] Ignoring source layer conv5_3_relu5_3_0_split
I0725 04:10:00.437405  3494 net.cpp:816] Ignoring source layer roi_pool5
F0725 04:10:00.737687  3494 net.cpp:829] Cannot copy param 0 weights from layer 'fc6'; shape mismatch.  Source param shape is 4096 25088 (102760448); target param shape is 4096 18432 (75497472). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer.
*** Check failure stack trace: ***

I read that there's basically a difference in the expected size that the network has been setup to expect. The one thing that I can imagine is that I am using the faster-rcnn VGG16 model( data/faster_rcnn_models/VGG16_faster_rcnn_final.caffemodel )? Is it possible to use this model instead of the one you mentioned( data/imagenet_models/VGG_CNN_M_1024.v2.caffemodel ) ?

P.S. Thank you for that awesome tutorial !

deboc · Answer 28 · Mon Jul 25 2016 12:57:29 GMT+0800 (China Standard Time)

Hi GeorgiAngelov,
I see you are using a final faster-rcnn caffemodel as pretrained network, but those ones doesn't have any fc6 layer, thus your issue.
The classical way for another dataset would be to use a pretrained caffe classifier for your data, and use its train.prototxt to build a faster-rcnn model.
So I suggest you investigate which classifier was used in your pretrained model, and provide this caffemodel (e.g. VGG_CNN_M_1024.v2.caffemodel) instead of the faster-rcnn one in the weights option

Lingwei Xie · Answer 29 · Mon Jul 25 2016 15:29:49 GMT+0800 (China Standard Time)

@GeorgiAngelov Hi！
I think the weight should be assigned imagenet pretrained model, not faster rcnn final model.
Hope it can help you.

Georgi Angelov · Answer 30 · Tue Jul 26 2016 09:05:59 GMT+0800 (China Standard Time)

@deboc, is the VGG_CNN_M_1024.v2.caffemodel considered a pre-trained model ? I am wondering if this model in itself is already capable of classifying objects. My basic idea is that I would like to start training a model with my own data but I would like that model to already be a trained model so I can leverage the weights.

My idea is that you can pretty much start with a trained .caffemodel file such as the VGG16_faster_rcnn_final.caffemodel and then train it even further. It appears that this might not be possible with this model in particular.

My question is: What does the v2 stand for in VGG_CNN_M_1024.v2.caffemodel and can I get a final model from this model to actually use it with tools/demo.py for example?

@JohnnyY8 , thank you for clarifying that. Until now, I was assuming that a model is a model is a model. I did not differentiate between pretrained model and a final model. I guess I am still not clear on the distinction.

Lingwei Xie · Answer 31 · Tue Jul 26 2016 09:19:27 GMT+0800 (China Standard Time)

@GeorgiAngelov If you want to train on final caffemodel and go further, it may be OK. Just pay attention to the difference of architecture of networks.
I also do not know what v2 meas. But according to tutorial I consider it as pre-trained model, when I train faster r-cnn on my own dataset. And the final caffemodel can be directly utilized to classify objects.

deboc · Answer 32 · Tue Jul 26 2016 15:54:07 GMT+0800 (China Standard Time)

Some confusion here. Every .caffemodel contains a pretrained model, with the weights of a converged neural network. The ones of faster-rcnn just also happen to be called "final" models.

Before touching faster-rcnn I suggest you start by getting more used to the caffe deep learning framework. A lot of pre-trained models can be found on the zoo, and are ready to use. Most of them are classifier that can infer an object class from an image. VGG_CNN_M_1024.v2.caffemodel is one of those (sorry, don't know about the v2 neither but the originals are from there).
Indeed you can finetune a classifier by removing the last layer and adapt it for another dataset. For that you can carefully change the learning rate of each layer in order to balance between "start from scratch policy" and "reuse the former network policy".
Good tutorials about caffe can be found on the Berkeley Vision website

Now about faster-rcnn. It's a framework for object detection, developed by R. Girshick. It's using the convnet classifier of your choice and the training phase learns how to detect the objects classified by the underlying classifier.
That's why you need to reuse or finetune a classifier for your data, before even considering detection (and faster-rcnn).

So :

If your objects are already classified by a converged model from the caffe zoo (e.g. 'aeroplane', 'bicycle', 'bird', 'person', etc for VGG), you can directly use this model to launch a faster-rcnn training
If not forget faster-rcnn for now and take a look on caffe tutorials to build your own classifier

Vikram Mohanty · Answer 33 · Tue Sep 06 2016 22:02:35 GMT+0800 (China Standard Time)

@JohnnyY8 : Hey, could you share how you managed to solve the "max_overlaps" issue ?

Lingwei Xie · Answer 34 · Wed Sep 07 2016 20:20:39 GMT+0800 (China Standard Time)

@vikiboy Hi, I do not remember it clearly, it seems that there are a little of xml files of gt that do not contain any objects. I remove them and corresponding images. Hope it can help you.

Lingwei Xie · Answer 35 · Wed Sep 07 2016 20:26:07 GMT+0800 (China Standard Time)

@vikiboy In addition, please pay attention to the coordinates of imagenet, it is starting from 1 not 0. I remember that there are two places nee to be modified. First one is lib/dataset/your_dataset.py. Second one is lib/dataset/imdb.py. I am not quite sure what I remember, please try them.

miyamon11 · Answer 36 · Mon Nov 07 2016 11:42:07 GMT+0800 (China Standard Time)

Hi, I carried out ednarb29's method, but when I ran ./tools/train_faster_rcnn_alt_opt.py --gpu 0 --net_name INRIA_Person --weights data/imagenet_models/VGG_CNN_M_1024.v2.caffemodel --imdb inria_train --cfg config.yml , I got error as below.

Output will be saved to /home/keisan/py-faster-rcnn/output/default/train Filtered 0 roidb entries: 1228 -> 1228 WARNING: Logging before InitGoogleLogging() is written to STDERR F1107 12:32:17.155658 12497 io.cpp:36] Check failed: fd != -1 (-1 vs. -1) File not found: ~/py-faster-rcnn/models/INRIA_Person/faster_rcnn_alt_optpt/stage1_rpn_solver60k80k.pt *** Check failure stack trace: ***
The file of "stage1_rpn_solver60k80k.pt" exist in the~/py-faster-rcnn/models/INRIA_Person/faster_rcnn_alt_opt .

What should I do?

Lingwei Xie · Answer 37 · Mon Nov 07 2016 15:07:02 GMT+0800 (China Standard Time)

@miyamon11 Hi:
I did not try to train model in alt_opt. But according to the error info "~/py-faster-rcnn/models/INRIA_Person/**faster_rcnn_alt_optpt/**stage1_rpn_solver60k80k.pt", is here any problem? I mean optpt?

peter · Answer 38 · Wed Nov 23 2016 09:09:49 GMT+0800 (China Standard Time)

I followed this tutorial but got the following errors:

Traceback (most recent call last):
File "./tools/train_net.py", line 113, in
max_iters=args.max_iters)
File "/media/username/DC1A-EA60/git14/py-faster-rcnn/tools/../lib/fast_rcnn/train.py", line 157, in train_net
pretrained_model=pretrained_model)
File "/media/username/DC1A-EA60/git14/py-faster-rcnn/tools/../lib/fast_rcnn/train.py", line 53, in init
self.solver.net.layers[0].set_roidb(roidb)
File "/media/username/DC1A-EA60/git14/py-faster-rcnn/tools/../lib/roi_data_layer/layer.py", line 68, in set_roidb
self._shuffle_roidb_inds()
File "/media/username/DC1A-EA60/git14/py-faster-rcnn/tools/../lib/roi_data_layer/layer.py", line 35, in _shuffle_roidb_inds
inds = np.reshape(inds, (-1, 2))
File "/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 224, in reshape
return reshape(newshape, order=order)
ValueError: total size of new array must be unchanged

Any ideas?

Klyuenkov Vladimir · Answer 39 · Wed Nov 23 2016 19:39:23 GMT+0800 (China Standard Time)

inds = np.reshape(inds, (-1, 2)) because of second demotion of reshaping is 2 you should use only even numbers of images in data set.

daniel · Answer 40 · Fri Dec 02 2016 00:16:31 GMT+0800 (China Standard Time)

@GeorgiAngelov The tutorial of @deboc uses the image_net model VGG_CNN_M_1024.v2.caffemodel. You can get it by following the steps here https://github.com/deboc/py-faster-rcnn#download-pre-trained-imagenet-models.

Arash Archor · Answer 41 · Wed Dec 14 2016 18:14:59 GMT+0800 (China Standard Time)

@ednarb29

first I would suggest you to start training and testing with a very little data set (100 images and 1k iterations), that you can debug the training and testing quite fast.

Does the problem occur during creation of the data set or during training?

Thanks I had the same problem:

overlaps = entry['max_overlaps']
KeyError: 'max_overlaps'

I deleted the cache file and it is now running.

Hesam Moshiri · Answer 42 · Thu Feb 16 2017 02:52:42 GMT+0800 (China Standard Time)

@ednarb29

What tool should I should to create imdb files?

Arturo · Answer 43 · Thu Feb 16 2017 15:22:25 GMT+0800 (China Standard Time)

@ednarb29 , removing cache file fixed problem for me regarding the max_overlaps

Hesam Moshiri · Answer 44 · Thu Feb 16 2017 15:37:34 GMT+0800 (China Standard Time)

@ArturoDeza
What tool/code have you used to make imdb file for training?

Arturo · Answer 45 · Thu Feb 16 2017 15:55:53 GMT+0800 (China Standard Time)

@VanitarNordic , I don't think there's a quick recipe for that. I've been following this setup:
https://github.com/smallcorgi/Faster-RCNN_TF
You will have to modify some lines of code in the factory.py, and copy the pascal_voc.py file to your my_dataset.py file and modify the lines of code regarding the number of training classes. *Besides also annotating all your images with .xml files

Hesam Moshiri · Answer 46 · Thu Feb 16 2017 16:19:02 GMT+0800 (China Standard Time)

@ArturoDeza
Thanks, actually I have annotated files but I've stuck in imdb creation :-(

Arturo · Answer 47 · Thu Feb 16 2017 16:21:23 GMT+0800 (China Standard Time)

@VanitarNordic What is the error you've been getting? You should create a new issue with the error you get when you run the end2end training script, that way we can be more helpful.

Hesam Moshiri · Answer 48 · Thu Feb 16 2017 16:23:40 GMT+0800 (China Standard Time)

@ArturoDeza
No, but I don't understand the fact that when we have a custom dataset, then when the model should be trained on that?! because end to end training does not have the dataset input parameter.

Roshan · Answer 49 · Sun Mar 19 2017 13:43:12 GMT+0800 (China Standard Time)

Hi!
I am getting the following error:
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "./tools/train_faster_rcnn_alt_opt.py", line 129, in train_rpn
max_iters=max_iters)
File "/home/siplab/py-faster-rcnn/tools/../lib/fast_rcnn/train.py", line 160, in train_net
model_paths = sw.train_model(max_iters)
File "/home/siplab/py-faster-rcnn/tools/../lib/fast_rcnn/train.py", line 101, in train_model
self.solver.step(1)
File "/home/siplab/py-faster-rcnn/tools/../lib/roi_data_layer/layer.py", line 144, in forward
blobs = self._get_next_minibatch()
File "/home/siplab/py-faster-rcnn/tools/../lib/roi_data_layer/layer.py", line 63, in _get_next_minibatch
return get_minibatch(minibatch_db, self._num_classes)
File "/home/siplab/py-faster-rcnn/tools/../lib/roi_data_layer/minibatch.py", line 22, in get_minibatch
assert(cfg.TRAIN.BATCH_SIZE % num_images == 0),
ZeroDivisionError: integer division or modulo by zero

Can anyone help me with that?

medhani · Answer 50 · Fri Jun 09 2017 12:48:29 GMT+0800 (China Standard Time)

I"m using INRIA Person data set. After running below command

./tools/train_faster_rcnn_alt_opt.py --gpu 0 --net_name INRIA_Person --weights data/imagenet_models/VGG_CNN_M_1024.v2.caffemodel --imdb inria_train --cfg config.yml

I got a error
File "./tools/train_faster_rcnn_alt_opt.py", line 62
print 'Loaded dataset {:s} for training'.format(imdb.name)
^
SyntaxError: invalid syntax

Can you please let me know reason behind this error

medhani · Answer 51 · Fri Jun 16 2017 09:29:06 GMT+0800 (China Standard Time)

Do you have any solutions for this error?
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "tools/train_faster_rcnn_alt_opt.py", line 129, in train_rpn
max_iters=max_iters)
File "/home/medhani/py-faster-rcnn/tools/../lib/fast_rcnn/train.py", line 160, in train_net
model_paths = sw.train_model(max_iters)
File "/home/medhani/py-faster-rcnn/tools/../lib/fast_rcnn/train.py", line 101, in train_model
self.solver.step(1)
File "/home/medhani/py-faster-rcnn/tools/../lib/roi_data_layer/layer.py", line 144, in forward
blobs = self._get_next_minibatch()
File "/home/medhani/py-faster-rcnn/tools/../lib/roi_data_layer/layer.py", line 63, in _get_next_minibatch
return get_minibatch(minibatch_db, self._num_classes)
File "/home/medhani/py-faster-rcnn/tools/../lib/roi_data_layer/minibatch.py", line 27, in get_minibatch
assert(cfg.TRAIN.BATCH_SIZE % num_images == 0),
ZeroDivisionError: integer division or modulo by zero

Thanks

Sean D Matthews · Answer 52 · Fri Jun 16 2017 09:43:49 GMT+0800 (China Standard Time)

@medhani It's not finding any images, which means either the path to your images is wrong, or there are no images listed in your image set text file.

medhani · Answer 53 · Fri Jun 16 2017 13:43:59 GMT+0800 (China Standard Time)

Thank Sean,I feel like there is a problem with my annotation file.

I'm training my network for spider detection. Annotations files are in .xml format. Is it the correct structure of the .xml file?

medhani · Answer 54 · Sun Jun 18 2017 15:04:41 GMT+0800 (China Standard Time)

@Roskgp96 Have you able find a solution for the below error?
line 27, in get_minibatch
assert(cfg.TRAIN.BATCH_SIZE % num_images == 0),
ZeroDivisionError: integer division or modulo by zero

Deleted user · Answer 55 · Fri Jun 30 2017 04:18:19 GMT+0800 (China Standard Time)

I used another modification of fasterrcnn in TF and it saves permutation into snapshots. In my case, I actually traced the code and found out that I was using an OLD permutation loaded with my snapshot. That means, if you modified the number of testing or training data, it is possible you would access outside the permutation array and return zero index, and then load nothing from roidb. A simply solution is to delete all snapshots or modify the permutation in your train_val.py after loaded. Hope it helps.

jzyztzn · Answer 56 · Tue Jul 11 2017 20:32:44 GMT+0800 (China Standard Time)

@ivalab Thanks, when I delete all the .pyc files in the path "$FRCN/lib/",it can train well without the ZeroDivisionError. @medhani Have you solve the problem? You could also try this method。

Madhu · Answer 57 · Sat Sep 16 2017 01:58:20 GMT+0800 (China Standard Time)

@deboc Apologies for digging up an old discussion topic, but you mentioned that we have the option to reuse a pre-trained model that already classifies our objects OR train our own model from scratch. Would that put any restrictions on how we train our faster R-CNN? Would the joint approximation (end-2-end) approach be better than the alternate training method?

fireden · Answer 58 · Wed Dec 06 2017 17:30:47 GMT+0800 (China Standard Time)

Hi,
I'm trying to train the net on my own dataset I have created, using video with microphone. It seems that I did everything as ednrab29 wrote (started from the model I've got from training VOC2007), but results a really surprising:

Testing a picture from my dataset gives me porper region and class=microphone (the only class (+backround) I left during training) with 1.0 probability
Testing a picture not from my dataset gives me nothing. That's can be explained I think by that my dataset is good enough and too small (hundreds of pics of one mic).
What's really surprized me that any picture from voc dataset gives me bounding boxes of objects in voc dataset with microphone label and lesser probability.
What have I done wrong?

Dai wenxin · Answer 59 · Wed Sep 19 2018 20:25:43 GMT+0800 (China Standard Time)

Excuse me.When I trained my own model, I used the model I trained to run demo.py to detect the graph. When the pixel was large (5000，3000), the results were all white include image.If the image pixel is not too large, there is no problem.What's the reason?(当我训练好自己的模型时，用自己训练的模型运行demo.py,去检测图形，当检测图片像素很大时（5000，3000），检测出来的结果是全白包括图片。如果图片像素不是太大，就不会出问题。请问这是什么原因？)

Lingwei Xie · Answer 60 · Thu Oct 04 2018 10:20:01 GMT+0800 (China Standard Time)

@mantou22 sorry, I do not understand "the results were all white"?

TJZhengJuepeng · Answer 61 · Sun Oct 14 2018 11:40:42 GMT+0800 (China Standard Time)

I"m using INRIA Person data set. After running below command

./tools/train_faster_rcnn_alt_opt.py --gpu 0 --net_name INRIA_Person --weights data/imagenet_models/VGG_CNN_M_1024.v2.caffemodel --imdb inria_train --cfg config.yml

I got a error
File "./tools/train_faster_rcnn_alt_opt.py", line 62
print 'Loaded dataset {:s} for training'.format(imdb.name)
^
SyntaxError: invalid syntax

Can you please let me know reason behind this error

have you fixed it?
I met the same problem

frk1993 · Answer 62 · Sat Dec 08 2018 23:09:33 GMT+0800 (China Standard Time)

I"m using INRIA Person data set. After running below command
./tools/train_faster_rcnn_alt_opt.py --gpu 0 --net_name INRIA_Person --weights data/imagenet_models/VGG_CNN_M_1024.v2.caffemodel --imdb inria_train --cfg config.yml
I got a error
File "./tools/train_faster_rcnn_alt_opt.py", line 62
print 'Loaded dataset {:s} for training'.format(imdb.name)
^
SyntaxError: invalid syntax
Can you please let me know reason behind this error

have you fixed it?
I met the same problem

Hey, I have the same problem. Have you fixed it?