AtriSaxena / OIDv4_to_VOC

Convert Open Image v4 Dataset to VOC pasacal format XML. Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes. https://github.com/openimages/dataset

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

OIDv4 To VOC XML format

If you have experience in working with Pascal VOC format but not able to work with Open Image Dataset v4 that has 600 classes. Than there are steps how you can download images per class and convert annotation to XML files.

The Code is documented and easy to understand. Please see the Usage steps down.

Open Image Dataset v4

All the information related to this huge dataset can be found here In these few lines are simply summarized some statistics and important tips.

TrainValidationTest#Classes
Images1,743,04241,620 125,436-
Boxes14,610,229204,621625,282600

Getting Started

Installation

Python3 is required.

  1. Clone this repository.
   git clone https://github.com/AtriSaxena/OIDv4_to_VOC.git
  1. Install the required package.
   pip3 install -r requirements.txt

Peek inside the requirements file if you have everything already installed. Most of the dependencies are common libraries.

Launch the ToolKit to check the available options

First of all, if you simply want a quick reminder of al the possible options given by the script, you can simply launch, from your console of choice, the OIDv4_to_VOC.py. Remember to point always at the main directory of the project

python3 OIDv4_to_VOC.py

or in the following way to get more information

python3 OIDv4_to_VOC.py -h

Download the Dataset

To download the Dataset per class goto this repository https://github.com/EscVM/OIDv4_ToolKit

Read README.MD file to download some classes.

Make Annotation into XML format.

To Convert a class say 'Apple' give source path of Images containing Images and Labels Folder.

└───Apple

    |0fdea8a716155a8e.jpg
    |2fe4f21e409f0a56.jpg
    |...
    └───Labels
            |0fdea8a716155a8e.txt
            |2fe4f21e409f0a56.txt
            |...

And give destination path to store converted xml files.

python3 OIDv4_to_VOC.py --sourcepath Dataset/train/Apple --dest_path Dataset/train/Annotation/Apple

After running the script Annotation will be saved in Destination Path.

About

Convert Open Image v4 Dataset to VOC pasacal format XML. Open Images is a dataset of ~9 million images that have been annotated with image-level labels and bounding boxes spanning thousands of classes. https://github.com/openimages/dataset

License:MIT License


Languages

Language:Python 100.0%