There are 2 repositories under historical-newspapers topic.
The GeoNewsMiner (GNM): An interactive spatial humanities tool to visualize geographical references in historical newspapers
Convert ALTO XML to plain text + minimal metadata
Python library to import OCR data in various formats into the canonical JSON format defined by the Impresso project.
Awesome historical newspaper analysis tools and literature
Tools for the use of Tesseract OCR in R
Repository of JSON schemas used in the Impresso project.
Dataset from the paper "Information Extraction from Public Meeting Articles"
The Hongkong News headline analysis project was conducted by the Chinese University of Hong Kong Library.
Everything to reproduce the CLEF HIPE 2020 campaign results.
This repository contains code and sample data related to running the impresso corpus through the text reuse detection software passim.