How to set the dimensions for the bounding box?

Question

How to set the dimensions for the bounding box?

nkulkarni3297 opened this issue 2 years ago · comments

Is there any way I can set the predefined dimensions of bounding box for my images?

On running the script of "detection_img_tf2.py" it actually creates the dimensions on images by its own but for my case, I actually want to predefine its dimensions for a particular image.

I would like to present the same with an example of what currently is happening with me.

Image 1:

Image 2:

On running the script it creates the bounding box for this 2 almost identical images but boxes are having different dimensions and also it is detecting the label in image 1 with wrong dimensions but actually it should detect the label in image 2 and also it should draw the dimensions as in image 2 for all images inside the folder but it is not making it. So I want to know if I can define the dimensions for the box beforehand.

Also what model can I actually use for such custom images as from 1k images from the folder it only detects around 12-15 with label and create xml files for the same? Currently I am using the ssd_mobilenet_320x320_coco2017 model file for this.

github-actions · Answer 1 · Tue Sep 13 2022 17:01:54 GMT+0800 (China Standard Time)

Thanks for contributing this issue! We will be replying soon.

Alvaro Leandro Cavalcante Carneiro · Answer 2 · Wed Sep 14 2022 23:08:12 GMT+0800 (China Standard Time)

Hello @nkulkarni3297 thank you for using the project!

Actually, the bounding box dimensions are determined by the model itself (through the model's inference), so it's expected to have some incorrect dimensions if your model makes mistakes, as you showed. The intention of this project is to be a semi-supervised helper for image labeling, so you'll need to manually fix the bad predictions.

As I explained, you'll need to manually label some images to train an initial model, and then use this model to help in the annotation of the complete dataset.

That said, if your initial model was trained with too few images or you have used an oversimplified architecture (like ssd_mobilenet_320x320), you'll probably get some poor predictions and only detect some images (like the 12-15 that you mentioned).

I recommend using at least 100 images in the initial training, and trying a more robust model (EfficinetDet, ResNet) or fine-tuning your SSD model to get better results. After that, you will definitely get better results using this library!

About the error in the label ("N/A"), this is actually very strange, probably it's something wrong in your label_map.pbtxt, please, follow the same format as shown in the TensorFlow documentation!

Ninad Kulkarni · Answer 3 · Thu Sep 15 2022 15:55:38 GMT+0800 (China Standard Time)

Hello @AlvaroCavalcante as you are saying to manually label images to train an initial model. So, here is what will happen then.

I want to auto_annotate images to create xml files so that I can train them to detect the signs. Now, if I create initial model and auto run that on model, then I can actually use my initial model only to train my final model.

Clarifying my point here:

I am working on Sign Language Detection using this repo
https://github.com/nicknochnack/RealTimeObjectDetection

Now, here I need to manually create xml files for some images and train that files and images on top of ssd_model to get the detections. So my flow would be like this

Label some images manually and create xml files for initial model - Use that to label entire dataset to create xml files - train those files again on top of ssd model - run the detection model to get the detections.

So in this process I can then directly use the initial model. Then what would be use of auto_annotate? I want to reduce the steps so I am trying this.

If you could guide me little bit on this, it would actually help me a lot.

Alvaro Leandro Cavalcante Carneiro · Answer 4 · Fri Sep 16 2022 12:11:17 GMT+0800 (China Standard Time)

Hello @nkulkarni3297, I'm not sure if I understood your whole context, but I'll try to explain based on what you asked.

The idea of the auto annotation package is to be used as a semi-supervised tool, so it's impossible to avoid the manual annotation part unless you find an open source model that was trained by someone else to be used as this "initial model".

Given that fact, your flow will be something like this:

Manually annotate some images of your dataset.
Train your initial model.
Use your initial model with auto_annotate to create new labels for the entire dataset.
Review the auto-generated annotations to improve the quality.
Retrain your model and be happy.

Let's suppose that you have 1000 images in your dataset. Considering that flow, you'll just waste your time labeling 100 images, and quickly reviewing the auto-generated labels.

In a "normal" scenario, you would need to manually label your 1000 images, which would use much more time!

In the end, this package is very simple, once we just use your model predictions to create an XML structure according to pascal VOC Format!

If you have more doubts, let me know!!

Ninad Kulkarni · Answer 5 · Fri Sep 16 2022 13:07:35 GMT+0800 (China Standard Time)

@AlvaroCavalcante Thank you for the response. I'll also look for some way if I can contribute towards this question as an alternate solution which can be useful for future and will definitely post here.

Alvaro Leandro Cavalcante Carneiro · Answer 6 · Wed Sep 21 2022 05:57:13 GMT+0800 (China Standard Time)

Awesome @nkulkarni3297, thank you for your contribution! This week I released the new version of this library, check this medium article to see the details, I hope that maybe this version could help you more.