BMW-InnovationLab / SORDI-Data-Pipeline-Reader

SORDI dataset has per frame annotation file in json format. Following tools create a COCO style annotation out of it. Thus the SORDI data can be easily fed into COCO style training pipelines.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Loading BMW SORDI into NVIDIA DALI Pipeline (COCO based)

SORDI dataset has per frame annotation file in json format. Following tools create a COCO style annotation out of it. Thus the SORDI data can be easily fed into COCO style training pipelines.

Sample code to consume the COCO style SORDI with NVIDIA DALI Pipeline is given and can be used in PyTorch/Tensorflow etc.

Table of Contents

Prerequisites

  • NVIDIA Docker 2
  • Docker CE latest stable release

Unzipping SORDI dataset

Open a Terminal in the SORDI folder and excute the following command to unzip.

for i in *.zip; do unzip "$i"; done

Delete zip files using the following command.

rm -rf *zip

This is how it should look like from a structure after unzipping.

ls -l SORDI

unzip

Building and Running the docker image

Base image is coming from NGC cloud (ngc.nvidia.com).

Please register if not done already (3 minutes).

Before you start, map the SORDI directory into the docker:

open 1_run.sh

Change /home/me/SORDI to the path of the extracted SORDI dataset file.

image

1 - Login to NVIDIA NVCR using the following command:

docker login nvcr.io

2 - Login to Docker using the following command:

docker login

3 - Build and run the image using the following command:

source 1_run.sh

When done it should look like this:

Screenshot from 2022-11-14 12-11-29

Run 2_traverse_unzipped_SORDI.ipynb

It walks through the unzipped SORDI files. It opens a sqlite database. For each frame and annotation it creates an entry into the FRAMES table.

Inside the terminal run:

jupyter notebook

image

Open the provided URL in a browser and run 2_traverse_unzipped_SORDI.ipynb

image

You find the created sqlite database in the workspace folder. Check its entries via:

sqlite SORDI.sqlite
.tables 
select * from FRAME limit 10;

Feel free to create additional table entries like:

  • Amount of objects in frame
  • Overlap/Pixeloverlap of objects in frame
  • Uncertainty estimation
  • Single class or multiclass

Run 3_create_coco_annotation.ipynb

Run the notebook to create the COCO annotation file.

The outcome is the file:

sordi.coco

This is a great place to filter the training dataset in a smart manner. E.g. choose multiclass training frames with a certain object overlap only. By now, this notebook does not filter at all but exports all data found in the database.

Run 4_run_DALI_coco_pipeline.ipynb

Ready to run the pipeline? Lets go. NVIDIA DALI does image decompression and augmentations on the GPU. Since the annotation file can get larger, the initial loading and parsing takes a moment.

image

Acknowledgments

  • Adolf Hohl
  • Ziad Saoud, BMW Group TechOffice MUNICH
  • Chafic Abou Akar, BMW Group TechOffice MUNICH

About

SORDI dataset has per frame annotation file in json format. Following tools create a COCO style annotation out of it. Thus the SORDI data can be easily fed into COCO style training pipelines.

License:Apache License 2.0


Languages

Language:Jupyter Notebook 89.3%Language:Python 5.9%Language:Dockerfile 3.3%Language:Shell 1.6%