DHd 2019 Workshop - Automatic Text and Feature Recognition: Mit READ Werkzeugen Texte erkennen und Dokumente analysieren
This repo contains the material for the session on Automatic Feature Recognition.
We will make use of the tool dhSegment. To install it, follow the installation procedures as described in the documentation, and create a python environment.
We will also make use of a Jupyter Notebook to get through the steps and visualize the results.
Images and annotations
The images used for this workshop are taken from Gallica, the digital library of the Bibliothèque Nationale de France (BnF).
You'll find the identifiers and links in the INFO.md
file in the downloaded images
folder.
With their identifier, the images can be downloaded in full resolution using the Pyllica tool.
The groundtruth annotations were produced by BnF's operators.
Trained model
The weights of the trained model can be downloaded here.