dataset-generation object-detection python3 xml

OIDv4 To VOC XML format

If you have experience in working with Pascal VOC format but not able to work with Open Image Dataset v4 that has 600 classes. Than there are steps how you can download images per class and convert annotation to XML files.

The Code is documented and easy to understand. Please see the Usage steps down.

Open Image Dataset v4

All the information related to this huge dataset can be found here In these few lines are simply summarized some statistics and important tips.

	Train	Validation	Test	#Classes
Images	1,743,042	41,620	125,436	-
Boxes	14,610,229	204,621	625,282	600

Getting Started

Installation

Python3 is required.

Clone this repository.

   git clone https://github.com/AtriSaxena/OIDv4_to_VOC.git

Install the required package.

   pip3 install -r requirements.txt

Peek inside the requirements file if you have everything already installed. Most of the dependencies are common libraries.

Launch the ToolKit to check the available options

First of all, if you simply want a quick reminder of al the possible options given by the script, you can simply launch, from your console of choice, the OIDv4_to_VOC.py. Remember to point always at the main directory of the project

python3 OIDv4_to_VOC.py

or in the following way to get more information

python3 OIDv4_to_VOC.py -h

Download the Dataset

To download the Dataset per class goto this repository https://github.com/EscVM/OIDv4_ToolKit

Read README.MD file to download some classes.

Make Annotation into XML format.

To Convert a class say 'Apple' give source path of Images containing Images and Labels Folder.

└───Apple

    |0fdea8a716155a8e.jpg
    |2fe4f21e409f0a56.jpg
    |...
    └───Labels
            |0fdea8a716155a8e.txt
            |2fe4f21e409f0a56.txt
            |...

And give destination path to store converted xml files.

python3 OIDv4_to_VOC.py --sourcepath Dataset/train/Apple --dest_path Dataset/train/Annotation/Apple

After running the script Annotation will be saved in Destination Path.

About

Convert Open Image v4 Dataset to VOC pasacal format XML. Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes. https://github.com/openimages/dataset

dataset-generation object-detection python3 xml

MIT License

Languages

Language:Python 100.0%