Paul Groth's starred repositories
DeepSpeech
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.
differential-datalog
DDlog is a programming language for incremental computation. It is well suited for writing programs that continuously update their output in response to input changes. A DDlog programmer does not write incremental algorithms; instead they specify the desired input-output mapping in a declarative manner.
semantic-python-overview
(subjective) overview of projects which are related both to python and semantic technologies (RDF, OWL, Reasoning, ...)
grimoirelab
GrimoireLab: platform for software development analytics and insights
cc2dataset
Easily convert common crawl to a dataset of caption and document. Image/text Audio/text Video/text, ...
nlp-labelling
Labelling platform for text using weak supervision.
record-linkage-tutorial
A tutorial on entity resolution (record linkage or de-duplication)
RFC-Security-Research
Paper, data and code from Investigating Potential Security Vulnerability Manifestation through Various Analyses & Inferences Regarding Internet RFCs
workshop_data_viz
Data visualization workshop (Ams data science center, 2022Feb)
text-alpha
Python implementation of character-level, textual inter-annotator agreement with Krippendorff's alpha.
data-discovery
Leveraging table semantics for data or knowledge discovery