A deep network architecture that independently learns texture patterns, discriminative patches, and shapes to solve various document image analysis tasks. PDF
Presents a block diagram of the proposed approach. The input image passes through the layers of convolutional filters of an CNN architecture to extract convolutional features. From the convolutional feature, the model extracts three different modalities of features: an encoding feature, a global feature, and a discriminative feature.
This repository provides the official PyTorch implementation of the Journal:
Document Image Analysis using Deep Multi-modular Features
Jobin K.V., Ajoy Mondal, and C. V. Jawahar
In SNCS 2022
Abstract: * Texture or repeating patterns, discriminative patches, and shapes are the salient features for various document image analysis problems. This article proposes a deep network architecture that independently learns texture patterns, discriminative patches, and shapes to solve various document image analysis tasks. The considered tasks are document image classification, genre identification from book covers, scientific document figure classification, and script identification. The presented network learns global, texture, and discriminative features and combines them judicially based on the nature of the problems to be solved. We compare the performance of the proposed approach with state-of-the-art techniques on multiple publicly available datasets such as Book-Cover, RVL-CDIP, CVSI and DocFigure. Experiments show that our approach outperforms genre and document figure classifications more than state-of-the-art and obtains comparable results on document image and script classification tasks. *
conda create --name dmmf python=3.8
conda activate dmmf
conda install -y pytorch=1.4.0 torchvision=0.5.0 cudatoolkit=10.1 -c pytorch
install pytorch encoding from here
To check the installation of pytorch encoding
run in python console
import encoding
After installing encoding, clone this repo
git clone https://github.com/jobinkv/Deep_Multi-modular_Features.git
cd Deep_Multi-modular_Features
We use four different datasets
cd tools/
python train.py -d 'script' -e exp1
-f 'gedl' -n 'resnext101'
-t 2 -l 0.0001 -k 20 -g 16 -c 256
--totalEppoch 40
cd tools/
python eval.py -d 'script' -e exp1
-f 'gedl' -n 'resnext101'
-t 2 -l 0.0001 -k 20 -g 16 -c 256
=====END=========