Media Monitoring of the Past's repositories
CLEF-HIPE-2020
Identifying Historical People, Places and other Entities: Shared Task on Named Entity Recognition and Linking on Historical Newspapers at CLEF 2020.
impresso-text-acquisition
🛠️ Python library to import OCR data in various formats into the canonical JSON format defined by the Impresso project.
impresso-frontend
🚀 The frontend application of the Impresso WebApp http://impresso-project.ch/app
impresso-datalab-notebooks
🔬 Impresso Datalab Notebooks
impresso-interface-review
Survey of digitized newspaper interfaces
impresso-pycommons
Python module with bits of code (objects, functions) highly reusable within impresso.
impresso-schemas
Repository of JSON schemas used in the Impresso project.
impresso-datalab-starter-pack
This repository provides a basic Python notebook setup with some preinstalled Python libraries. It includes a Dockerfile for building a Docker image containing the necessary environment and a requirements.txt file listing the required Python libraries.
impresso-docker-stack
Docker stack for impresso app
dataset-challenge-lid
Ground truth dataset with language identification information for challenging news articles
impresso-user-admin
Basic Django admin to manage user-related data in Impresso's Master DB.
llm-transcript-postcorrection
Work on OCR/ASR/HTR post-correction.
newsagency-classification
Recognition of news agency mentions in historical news articles (BERT-based token classification).
transmedia
Website for the Transmedia History Conference
impresso-datalab
Impresso Datalab static Astro website
digital-history-ch-2024
Extended abstract for the Digital History Switzerland conference using the official template
impresso-data-sanitycheck
Code to perform sanity checks on the acquired newspaper data.
impresso-essentials
⚙️ Python package highly reusable modules and functions within impresso.
impresso-jscommons
Reusable components for impresso-frontend and impresso-middle-layer
impresso-linguistic-processing
Code for running spaCy on rebuilt impresso data.
impresso-middle-layer
Middle layer API
impresso-passim
This repository contains code and sample data related to running the impresso corpus through the text reuse detection software passim.
impresso-py
Impresso Python Library to interact with the Impresso Public API
impresso-text-embedder
multilingual text vectorizer for semantic search and comparison
transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.