phoscnet phocnet phosnet historical-document-analysis historical-documents word-recognition icdar2021 icdar zero-shot-learning tensorflow keras-tensorflow neural-networks deep-learning deep-neural-networks spatial-pyramid-pooling

PHOSC-Zero-Shot-Word-Recognition

PHOSNet and Pho(SC)Net are convolution neural networks for characterizing document word images using proposed PHOS and Pho(SC) embedding, which can be used for word recognition in Zero-shot setting. This code was used for generating results for the work Pho(SC)Net: An Approach Towards Zero-shot Word Image Recognition in Historical Documents.

Implemented using Tensorflow 2.x framework.

Requirements

Python 3.8
Tensorflow 2.x
Numpy
Pandas
Matplotlib

Files

File Name	Significance
Alphabet.csv	Stores count of primary shapes for alphabets
aug_images.py	Augmentation of the images in folder (also saves them in same folder)
phoc_label_generator.py	Generating PHOC vectors
phos_label_generator.py	Generating PHOS vectors
train_<model>.py	Training the model
test_<model>.py	Testing the performance of saved model

Saved model names follow a naming convention, i.e. new_(model_identifier)_(batch_size)_.h5

Additional Directories generated after training/testing

Directory	Significance
Train_History	Saves the training data (losses/similarities) for each epoch
Test_Results	Saves true and predicted labels for each sample in test set
Test_Plots	Saves plots generated by testing module

Plots generated during testing are named as follows: (model_name)_(test_identifier)_(test_setting).png

Test result files are named as follows: (model_name)_(test_identifier)_(test_setting).csv

Dataset

We are providing the dataset used ( along with the augmented images)

Code Execution

Consider organizing your dataset in the following directory structure:

📦 Dataset
┣ 📦 Train
┣ 📦 Validation
┣ 📦 Test
┣ 📜 Alphabet.csv
┣ 📜 Test_seen.csv
┣ 📜 Test_unseen.csv
┣ 📜 Train.csv
┣ 📜 Train_unseen.csv
┣ 📜 Valid_seen.csv
┗ 📜 Valid_unseen.csv

Execute the following :

For data augmentation
python aug_images.py -op Dataset/Validation -map Train_unseen.csv -np 40 -aug 1
Please perform this after going through code once. It will add n copies of already existing images to the same folder.
File Name assignment is done using the number, e.g., if there are 20 images already in the folder, new augmented images will be as 21.png, 22.png .........
For training a model
python train_phoscnet.py -idn MySimpleDataset -batch 128 -lr 0.0005 -epoch 200 -mp Dataset/Train.csv -vmap Dataset/Valid_seen.csv -tr Dataset/Train -vi Dataset/Validation
For testing the saved model
python test_phoscnet.py -model GW_split1_64.h5 -idn GenConv -mp Test_seen.csv -stf Dataset/Train -smap Dataset/Train_unseen.csv -train Dataset/Train

Results

Following image shows prediction results on images from IAM-handwriting dataset.

Citation

If you find the paper or the source code useful to your projects, please cite the following bibtex:

@InProceedings{10.1007/978-3-030-86549-8_2,
author="Rai, Anuj and Krishnan, Narayanan C. and Chanda, Sukalpa",
title="Pho(SC)Net: An Approach Towards Zero-Shot Word Image Recognition in Historical Documents",
booktitle="Document Analysis and Recognition -- ICDAR 2021",
year="2021",
publisher="Springer International Publishing",
address="Cham",
pages="19--33",
isbn="978-3-030-86549-8"
}

About

Implementation of PHOSNet and Pho(SC)Net for Word Recognition in Historical Documents. Implemented using Tensorflow 2.x

phoscnet phocnet phosnet historical-document-analysis historical-documents word-recognition icdar2021 icdar zero-shot-learning tensorflow keras-tensorflow neural-networks deep-learning deep-neural-networks spatial-pyramid-pooling

MIT License

Languages

Language:Python 100.0%