Document Image Analysis using Deep Multi-modular Features

Official Project Webpage

A deep network architecture that independently learns texture patterns, discriminative patches, and shapes to solve various document image analysis tasks. PDF

Presents a block diagram of the proposed approach. The input image passes through the layers of convolutional filters of an CNN architecture to extract convolutional features. From the convolutional feature, the model extracts three different modalities of features: an encoding feature, a global feature, and a discriminative feature.

This repository provides the official PyTorch implementation of the Journal:

Document Image Analysis using Deep Multi-modular Features
Jobin K.V., Ajoy Mondal, and C. V. Jawahar
In SNCS 2022
PDF

Abstract: * Texture or repeating patterns, discriminative patches, and shapes are the salient features for various document image analysis problems. This article proposes a deep network architecture that independently learns texture patterns, discriminative patches, and shapes to solve various document image analysis tasks. The considered tasks are document image classification, genre identification from book covers, scientific document figure classification, and script identification. The presented network learns global, texture, and discriminative features and combines them judicially based on the nature of the problems to be solved. We compare the performance of the proposed approach with state-of-the-art techniques on multiple publicly available datasets such as Book-Cover, RVL-CDIP, CVSI and DocFigure. Experiments show that our approach outperforms genre and document figure classifications more than state-of-the-art and obtains comparable results on document image and script classification tasks. *

Pytorch Implementation

Installation

conda create --name dmmf python=3.8
conda activate dmmf
conda install -y pytorch=1.4.0 torchvision=0.5.0 cudatoolkit=10.1 -c pytorch

install pytorch encoding from here
To check the installation of pytorch encoding
run in python console

import encoding

After installing encoding, clone this repo

git clone https://github.com/jobinkv/Deep_Multi-modular_Features.git
cd Deep_Multi-modular_Features

The datasets

We use four different datasets

RVL-CDIP dataset link
DocFigure dataset link
Book cover dataset link
CVSI dataset link

Trained model

Script classification link
Book cover link
Docfigure link
rvl-cdip link

Train

cd tools/
python train.py -d 'script' -e exp1
     -f 'gedl' -n 'resnext101' 
    -t 2 -l 0.0001  -k 20 -g 16 -c 256 
    --totalEppoch 40

Evaluate

cd tools/
python eval.py -d 'script' -e exp1
     -f 'gedl' -n 'resnext101' 
    -t 2 -l 0.0001  -k 20 -g 16 -c 256

=====END=========

jobinkv / Deep_Multi-modular_Features