Handwriting Recognition Using Im2Latex

This repository uses the architecture proposed in "What You Get Is What You See: A Visual Markup Decompiler" (http://arxiv.org/pdf/1609.04938v1.pdf) to the problem of Handwriting Recognition. The base implementation was done in Tensorflow by ritheshkumar95/im2latex-tensorflow (forked) and was modified to work for Handwriting Recognition. The original Torch implementation of the paper is located here: https://github.com/harvardnlp/im2markup/blob/master/

What You Get Is What You See: A Visual Markup Decompiler  
Yuntian Deng, Anssi Kanervisto, and Alexander M. Rush
http://arxiv.org/pdf/1609.04938v1.pdf

This deep learning framework can be used to learn a representation of an image. In this case, our input image is an image of text and we are converting this image to an ASCII representation.

Below is an example of an input image of text:

The goal is to infer the following ASCII text:

MOVE

Important Files

attention.py: File that is run for training and testing

data_loaders: File that is called by attention.py to load data files

tflib/: Contains network.py and ops.py which contain the CNN and LSTM architectures implemented in Tensorflow.

scripts/: Contains scripts needed to preprocess data

images/: Contains image data

baseline_model/: Contains code from our baseline and milestone models

att_imgs: Contains images with a visualization of attention

Preprocessing

We obtained our dataset from the IAM Handwriting Database 3.0 (http://www.fki.inf.unibe.ch/databases/iam-handwriting-database/download-the-iam-handwriting-database). A sample of these images and directory structure is included in this repo in the images folder. Follow the steps below to preprocess the image data.

Download the words dataset from the IAM Handwriting Database and place the words.txt file in the data folder.
Run the parse raw data script and place the images_path_label.csv file that is created in the images folder.

python scripts/parse_raw_data.py images/data/words.txt

Resize all images to have a width of 120 pixels and a height of 50 pixels.

python scripts/resize_images.py images/images_path_label.csv images/

Preprocess images by cropping out whitespace

python scripts/preprocessing/preprocess_images_handwriting.py --input-dir images/data --output-dir images/processed

Create labels file called labels.norm.lst that contains pipe ("|") separated characters of the ASCII convert of the corresponding image in images_path_label.csv.

python scripts/preprocessing/preprocess_labels_handwriting.py images/image_path_file.csv images/

Filter images into a train.lst, test.lst, and valid.lst. Move these files to images/

python scripts/preprocessing/preprocess_filter_handwriting.py

Lastly create train, test, and valid buckets to be read from when training.

python scripts/preprocessing/create_buckets.py train

python scripts/preprocessing/create_buckets.py test

python scripts/preprocessing/create_buckets.py valid

Training

Now, we are finally ready to train our model. You can do this by running:

python attention.py

Default hyperparameters used:

BATCH_SIZE = 16
EMB_DIM = 60
ENC_DIM = 256
DEC_DIM = ENC_DIM*2
D = 512 (#channels in feature grid)
V = 502 (vocab size)
NB_EPOCHS = 50
H = 20 (Maximum height of feature grid)
W = 50 (Maximum width of feature grid)

You can use the following flags to set additional hyperparameters:

--lr: learning rate
--decay_rate: decay_rate
--num_epochs: number of epochs
--num_iterations: number of iterations
--optimizer: type of optimizer (sgd, adam, rmsprop)
--batch_size: batch size
--embedding_size: embedding size

Testing

predict() function in the attention.py script can be called to predict from validation or test sets. If you call this function with visualization turned on, it will save images with an indication of where attention was placed for a certain character.