Robert Sachunsky's repositories
sbb_web-integration
Visualization of NER+EL+Topic Modelling + Image-Search
sbb_images
Annotation Tool and Image Search
page2tsv
PAGE-XML to TSV
sbb_utils
shared functionality
sbb_tools
Digitalized Collections of the Berlin State Library: ALTO-XML Processing Tools / batch NER + EL / BERT-pre-training
sbb_topic-modelling
Topic Modelling
sbb_knowledge-base
Wikidata + Wikipedia Knowledge-Base Extraction for EL-purposes
sbb_ocr_postcorrection
Two-Step Approach to OCR Post-Correction
sbb_ned
Named Entity Disambiguation and Linking
sbb_ner
Named Entity Recognition
dh-datenkompetenz2024-ocr
Slides and materials for contribution to the Ringvorlesung DH in SS 24 at TUD
gt-repo-scripts
XSLT and shell scripts for analyzing and creating GitHub pages of a ground truth repository. These are centrally managed and can be used by all repositories created with gt-repo-template (https://github.com/OCR-D/gt-repo-template).
ocrd-demo-2021-05-12
Demos for OCR-D presentation at OCR@vDHd
tesstrain
Train Tesseract LSTM with make
dta-tools
Tools used in the project "Deutsches Textarchiv"
ocrd_monitor
Web frontend for ocrd_manager
tessdoc
Tesseract documentation
tesserocr
A Python wrapper for the tesseract-ocr API
tesseract
Tesseract Open Source OCR Engine (main repository)
dta-lexdb-applications
formatting and integrating the Deutches Textarchiv dictionary into various applications
mkn-test-gt
meine DHd24-GT-Erfahrung
mygt
mydesc
htr-united
Ground Truth Resources for the HTR of patrimonial documents
ocrd_keraslm
Simple character-based language model using keras
kraken
OCR engine for all the languages
ddb-metadata-schematron-validation
Schematron-Validierungen der Fachstelle Bibliothek der Deutsche Digitalen Bibliothek