DHd 2019 Workshop - Automatic Text and Feature Recognition: Mit READ Werkzeugen Texte erkennen und Dokumente analysieren

This repo contains the material for the session on Automatic Feature Recognition.

Graphical content extraction using dhSegment

Installation

We will make use of the tool dhSegment. To install it, follow the installation procedures as described in the documentation, and create a python environment.

We will also make use of a Jupyter Notebook to get through the steps and visualize the results.

Downloads

Images and annotations

The images used for this workshop are taken from Gallica, the digital library of the Bibliothèque Nationale de France (BnF). You'll find the identifiers and links in the INFO.md file in the downloaded images folder.

With their identifier, the images can be downloaded in full resolution using the Pyllica tool.

The groundtruth annotations were produced by BnF's operators.

Trained model

The weights of the trained model can be downloaded here.

solivr / workshop_dhd19

DHd 2019 Workshop - Automatic Text and Feature Recognition: Mit READ Werkzeugen Texte erkennen und Dokumente analysieren

Graphical content extraction using dhSegment

Installation

Downloads

About

Languages