Graphical Object Detection in document images

This repository contains end-to-end trainable deep learning based framework to localize graphical objects in the document images called as Graphical Object Detection (GOD).

This repository is built on jwyang/faster-rcnn.pytorch. This implementation has the following features:

It is pure Pytorch code. Of course, there are some CUDA code.
It supports multi-image batch training.
It supports multiple GPUs training.

The results of GOD on different datasets is listed in the paper.

Getting Started

Clone the repo:

    git clone https://github.com/rnjtsh/graphical-object-detector.git

Then, create a folder:

    cd GOD && mkdir data

prerequisites

Python 2.7 or 3.6
Pytorch 0.4.0
CUDA 8.0 or higher

Compilation

The compilation is done as instructed by jwyang/faster-rcnn.pytorch.

Dataset

This repository uses the dataset in the same format as PASCAL VOC. But other format of datasets can also be adapted as done by jwyang/faster-rcnn.pytorch. The dataset should be prepared as per the following tree structure.

    GODdevkit2019
      ├── GOD2019
          ├── JPEGImages
          │   ├──  GOD001.jpg
          │   ├──  GOD002.jpg
          │   ├──  ...
          ├── ImageSets
          │   ├──  Main
          │   │    ├──  train.txt
          │   │    ├──  val.txt
          │   │    ├──  test.txt
          │   │    ├──  ...
          └── Annotations
              ├──  GOD001.xml
              ├──  GOD002.xml
              ├──  ...

Pretrained Models

We used ImageNet pretrained weights (VGG16 and ResNets) from Caffe in our experiments. You can download these two models from:

VGG16
ResNet50, ResNet101, ResNet152

Download them and put them into the data/pretrained_model/.

If you want to use pytorch pre-trained models, please remember to transpose images from BGR to RGB, and also use the same data transformer (minus mean and normalize) as used in pretrained model.

Citation

If you find this work useful, please cite the following paper "Ranajit Saha, Ajoy Mondal and C V Jawahar, Graphical Object Detection in Document Images, ICDAR-2019"

jyangtum / graphical-object-detector