Daniel Stoekl's repositories
sofer_mahir
HTR project on big manuscripts of Rabbinic treatises from the Tannaitic period
Ancient-Greek-BERT
Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"
bible-clusterer
Web application to perform clustering of text data on LXX and SBL Greek New Testament
Data-Processing
Tools for initial data processing to populate database.
gpt4all
gpt4all: a chatbot trained on a massive collection of clean assistant data including code, stories and dialogue
jpeg-sandbox
Interactively edit individual DCT blocks in any JPEG image in the browser.
layout-parser
A Python Library for Document Layout Understanding
modern_practical_nlp
This course covers how you can use NLP to do stuff.
named-entity-recognition
Notebooks for teaching Named Entity Recognition at the Cultural Heritage Data School, run by Cambridge Digital Humanities, June-July 2020
NN-SVG
Publication-ready NN-architecture schematics.
PlotNeuralNet
Latex code for making neural networks diagrams
RambaNet
Authorship attribution of ancient books using CNN (under active development).
segment-anything
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
SmartScript
Old text recognition in Hebrew
sunfish
Sunfish: a Python Chess Engine in 111 lines of code
text-fabric
File format, model, API, and apps for manipulating text and its annotated features
unikud
Hebrew nikud with transfomers
WatermarkReco
Pytorch implementation of the paper "Large-Scale Historical Watermark Recognition: dataset and a new consistency-based approach"
yap
Yet Another (natural language) Parser