computer-vision deep-learning paleography ctc-loss htr image-decomposition ocr sprites supervised-learning unsupervised-learning

The Learnable Typewriter
_{A Generative Approach to Text Analysis}

Official PyTorch implementation of The Learnable Typewriter: A Generative Approach to Text Αnalysis.
Authors: Yannis Siglidis, Nicolas Gonthier, Julien Gaubil, Tom Monnier, Mathieu Aubry.
Research Institute: Imagine, LIGM, Ecole des Ponts, Univ Gustave Eiffel, CNRS, Marne-la-Vallée, France
ICDAR 2024 (Best Paper Award).

Install 🌱

conda create --name ltw pytorch==1.9.1 torchvision==0.10.1 cudatoolkit=11.3 -c pytorch -c conda-forge
conda activate ltw
python -m pip install -r requirements.txt

Datasets ☀️ Models 🔨

Dropbox: Download & extract datasets.zip and runs.zip in the parent folder.
Huggingface: python scripts/download-hf.py

Inference 🍑

For minimal inference and plotting we provide a standalone notebook. Open in Colab

To reproduce the figures of the paper run the scripts/figures.ipynb notebook.

Helper scripts are also provided to perform evaluation on the corresponding datasets:

python scripts/eval.py -i <MODEL-PATH> {--eval, --eval_best}

and produce figures and sprites for certain samples:

python scripts/eval.py -i <MODEL-PATH> -s {train, val, test} -id 0 0 0 -is 1 2 3 --plot_sprites

Training 🌼

Training and model configure is performed though hydra. We supply the corresponding config files for all our baseline experiments.

Google 📰

python scripts/train.py supervised-google.yaml
python scripts/train.py unsupervised-google.yaml

Copiale 📜

python scripts/train.py supervised-copiale.yaml
python scripts/train.py unsupervised-copiale.yaml

Fontenay ⛪

python scripts/train.py supervised-fontenay.yaml

and finetune with:

python scripts/fontenay.py -i fontenay/fontenay/<MODEL_NAME> -o fontenay/fontenay-ft/ --max_epochs 150 -k "training.optimizer.lr=0.001"

To all of the above experiment config files, additional command line overrides could be applied to further modify them using the hydra syntax.

Custom Dataset 💾

Trying the LT on a new dataset is dead easy.

First create a config file:

configs/<DATASET_ID>.yaml

...

DATASET-TAG:
  path: <DATASET-NAME>/
  sep: ''                    # How the character separator is denoted in the annotation. 
  space: ' '                 # How the space is denoted in the annotation.

Then create the dataset folder:

datasets/<DATASET-NAME>
├── annotation.json
└── images
  ├── <image_id>.jpg
  └── ...

The annotation.json file should be a dictionary with entries of the form:

    "<image_id>": {
        "split": "train",                            # {"train", "val", "test"} - "val" is ignored in the unsupervised case.
        "label": "A beautiful calico cat."           # The text that corresponds to this line.
    },

You can completely ignore the annotation.json file in the case of unsupervised training without evaluation.

Logging 📉

Logging is done through tensorboard. To visualize results run:

tensorboard --logdir ./<run_dir>/

If you want to dive in deeper, check out our experimental features.

Citing 💫

@misc{the-learnable-typewriter,
	title = {The Learnable Typewriter: A Generative Approach to Text Line Analysis},
	author = {Siglidis, Ioannis and Gonthier, Nicolas and Gaubil, Julien and Monnier, Tom and Aubry, Mathieu},
	publisher = {arXiv},
	year = {2023},
	url = {https://arxiv.org/abs/2302.01660},
	keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
	doi = {10.48550/ARXIV.2302.01660},
	copyright = {Creative Commons Attribution 4.0 International}
}

Also check out 🌈

If you like this project, have also a look to related work produced by our team:

Acknowledgements ✨

We would like to thank Malamatenia Vlachou and Dominique Stutzmann for sharing ideas, insights and data for applying our method in paleography; Vickie Ye and Dmitriy Smirnov for useful insights and discussions; Romain Loiseau, Mathis Petrovich, Elliot Vincent, Sonat Baltacı for manuscript feedback and constructive insights. This work was partly supported by the European Research Council (ERC project DISCOVER, number 101076028), ANR project EnHerit ANR-17-CE23-0008, ANR project VHS ANR-21-CE38-0008 and HPC resources from GENCI-IDRIS (2022-AD011012780R1, AD011012905).

About

The Learnable Typewriter: A Generative Approach to Text Line Analysis

http://imagine.enpc.fr/~siglidii/learnable-typewriter/