This repository contains end-to-end trainable deep learning based framework to localize graphical objects in the document images called as Graphical Object Detection (GOD).
This repository is built on jwyang/faster-rcnn.pytorch. This implementation has the following features:
-
It is pure Pytorch code. Of course, there are some CUDA code.
-
It supports multi-image batch training.
-
It supports multiple GPUs training.
The results of GOD on different datasets is listed in the paper.
Clone the repo:
git clone https://github.com/rnjtsh/graphical-object-detector.git
Then, create a folder:
cd GOD && mkdir data
- Python 2.7 or 3.6
- Pytorch 0.4.0
- CUDA 8.0 or higher
The compilation is done as instructed by jwyang/faster-rcnn.pytorch.
This repository uses the dataset in the same format as PASCAL VOC. But other format of datasets can also be adapted as done by jwyang/faster-rcnn.pytorch. The dataset should be prepared as per the following tree structure.
GODdevkit2019
├── GOD2019
├── JPEGImages
│ ├── GOD001.jpg
│ ├── GOD002.jpg
│ ├── ...
├── ImageSets
│ ├── Main
│ │ ├── train.txt
│ │ ├── val.txt
│ │ ├── test.txt
│ │ ├── ...
└── Annotations
├── GOD001.xml
├── GOD002.xml
├── ...
We used ImageNet pretrained weights (VGG16 and ResNets) from Caffe in our experiments. You can download these two models from:
Download them and put them into the data/pretrained_model/
.
If you want to use pytorch pre-trained models, please remember to transpose images from BGR to RGB, and also use the same data transformer (minus mean and normalize) as used in pretrained model.
@inproceedings{saha2019graphical,
title={Graphical Object Detection in Document Images},
author={Saha, Ranajit and Mondal, Ajoy and Jawahar, CV},
booktitle={2019 International Conference on Document Analysis and Recognition (ICDAR)},
pages={51--58},
year={2019},
organization={IEEE}
}