rnjtsh / graphical-object-detector

Graphical Object Detection in Document Images

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Graphical Object Detection in document images

This repository contains end-to-end trainable deep learning based framework to localize graphical objects in the document images called as Graphical Object Detection (GOD).

This repository is built on jwyang/faster-rcnn.pytorch. This implementation has the following features:

  • It is pure Pytorch code. Of course, there are some CUDA code.

  • It supports multi-image batch training.

  • It supports multiple GPUs training.

The results of GOD on different datasets is listed in the paper.

Getting Started

Clone the repo:

    git clone https://github.com/rnjtsh/graphical-object-detector.git

Then, create a folder:

    cd GOD && mkdir data

prerequisites

  • Python 2.7 or 3.6
  • Pytorch 0.4.0
  • CUDA 8.0 or higher

Compilation

The compilation is done as instructed by jwyang/faster-rcnn.pytorch.

Dataset

This repository uses the dataset in the same format as PASCAL VOC. But other format of datasets can also be adapted as done by jwyang/faster-rcnn.pytorch. The dataset should be prepared as per the following tree structure.

    GODdevkit2019
      ├── GOD2019
          ├── JPEGImages
          │   ├──  GOD001.jpg
          │   ├──  GOD002.jpg
          │   ├──  ...
          ├── ImageSets
          │   ├──  Main
          │   │    ├──  train.txt
          │   │    ├──  val.txt
          │   │    ├──  test.txt
          │   │    ├──  ...
          └── Annotations
              ├──  GOD001.xml
              ├──  GOD002.xml
              ├──  ...

Pretrained Models

We used ImageNet pretrained weights (VGG16 and ResNets) from Caffe in our experiments. You can download these two models from:

Download them and put them into the data/pretrained_model/.

If you want to use pytorch pre-trained models, please remember to transpose images from BGR to RGB, and also use the same data transformer (minus mean and normalize) as used in pretrained model.

Citation

@inproceedings{saha2019graphical,
  title={Graphical Object Detection in Document Images},
  author={Saha, Ranajit and Mondal, Ajoy and Jawahar, CV},
  booktitle={2019 International Conference on Document Analysis and Recognition (ICDAR)},
  pages={51--58},
  year={2019},
  organization={IEEE}
}

About

Graphical Object Detection in Document Images

License:MIT License


Languages

Language:Python 72.0%Language:C 15.0%Language:Cuda 11.4%Language:C++ 0.6%Language:Shell 0.5%Language:MATLAB 0.5%