IDM 4 Data Science (qurator-spk)

IDM 4 Data Science

qurator-spk

Organization data from Github https://github.com/qurator-spk

IDM 4 Data Science @StabiBerlin

Location:Berlin

Home Page:https://mmk.sbb.berlin

GitHub:@qurator-spk

IDM 4 Data Science's repositories

eynollah

Document Layout Analysis

Language:PythonLicense:Apache-2.0Stargazers:391Issues:18Issues:85

sbb_textline_detection

Detect textlines in document images

Language:PythonLicense:Apache-2.0Stargazers:92Issues:9Issues:30

sbb_binarization

Document Image Binarization

Language:PythonLicense:Apache-2.0Stargazers:78Issues:5Issues:34

dinglehopper

An OCR evaluation tool

Language:PythonLicense:Apache-2.0Stargazers:68Issues:5Issues:90

neat

Named entity annotation tool

Language:JavaScriptLicense:Apache-2.0Stargazers:28Issues:5Issues:47

sbb_ner

Named Entity Recognition

Language:PythonLicense:Apache-2.0Stargazers:18Issues:6Issues:4

sbb_ned

Named Entity Disambiguation and Linking

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:16Issues:3Issues:3

sbb_images

Image Annotation Tool and Image Search

sbb_ocr_postcorrection

Two-Step Approach to OCR Post-Correction

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:14Issues:3Issues:5

mods4pandas

Extract the MODS/ALTO metadata of a bunch of METS/ALTO files into pandas DataFrames for data analysis

Language:PythonLicense:Apache-2.0Stargazers:12Issues:2Issues:65

sbb_pixelwise_segmentation

Obsolete repo, merged into eynollah

Language:PythonLicense:Apache-2.0Stargazers:12Issues:5Issues:16

ocrd-galley

A Dockerized test environment for OCR-D processors 🚢

Language:ShellLicense:Apache-2.0Stargazers:7Issues:4Issues:76

page2tsv

PAGE-XML to TSV

Language:PythonLicense:Apache-2.0Stargazers:4Issues:4Issues:7

ZEFYS2025

Stabi Berlin dataset for NER

License:CC-BY-4.0Stargazers:4Issues:0Issues:0

ocrd_repair_inconsistencies

Automatically re-order lines, words and glyphs to become textually consistent with their parents.

Language:PythonLicense:Apache-2.0Stargazers:2Issues:2Issues:6

ocrd_trocr

OCR-D processor for TrOCR

Language:PythonLicense:Apache-2.0Stargazers:2Issues:3Issues:6

publications

Qurator-SPK team publications

ocrd_calamari

Recognize text using Calamari OCR and the OCR-D framework

Language:PythonLicense:Apache-2.0Stargazers:1Issues:0Issues:0

PyTorch-YOLOv3

Minimal PyTorch implementation of YOLOv3

Language:PythonLicense:GPL-3.0Stargazers:1Issues:1Issues:0

sbb_knowledge-base

Wikidata + Wikipedia Knowledge-Base Extraction for EL-purposes

Language:PythonStargazers:1Issues:1Issues:0

sbb_tools

Digitalized Collections of the Berlin State Library: ALTO-XML Processing Tools / batch NER + EL / BERT-pre-training

Language:PythonStargazers:1Issues:2Issues:0

sbb_web-integration

Visualization of NER+EL+Topic Modelling + Image-Search

Language:JavaScriptStargazers:1Issues:1Issues:0

setuptools_ocrd

Manage your package version through ocrd-tool.json

Language:PythonLicense:Apache-2.0Stargazers:1Issues:2Issues:15
Language:PythonStargazers:0Issues:1Issues:0

abbyy-to-alto

Converts FineReader abbyy.xml to alto.xml.

Language:JavaLicense:MITStargazers:0Issues:1Issues:0

download-gitter.im-chat

tiny tool to download gitter.im chat

Language:PerlLicense:GPL-3.0Stargazers:0Issues:0Issues:0

sbb_ner_hf

sbb ner finetuning with huggingface

Language:Jupyter NotebookStargazers:0Issues:0Issues:0

sbb_topic-modelling

Topic Modelling

Language:PythonStargazers:0Issues:1Issues:0

sbb_utils

shared functionality

Language:PythonStargazers:0Issues:1Issues:0