anetschka's starred repositories
deepdoctection
A Repo For Document AI
langdetect
Port of Google's language-detection library to Python.
TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
tensorstore
Library for reading and writing large multi-dimensional arrays.
pyahocorasick
Python module (C extension and plain python) implementing Aho-Corasick algorithm
ACL-anthology-corpus
This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs
germalemma
A lemmatizer for German language text
wikipedia2corpus
Wikipedia text corpus for self-supervised NLP model training
UkrainianLT
A collection of links to Ukrainian language tools
german_compound_splitter
Compound splitter for German language ("Komposita-Zerlegung") based on large dictionary combined with highly efficient multi-pattern string search
rollinglda
A rolling version of the Latent Dirichlet Allocation.
keyword-selection
The implementation of "Domain Representative Keywords Selection: A Probabilistic Approach" (Findings of ACL '22)
german_legal_sentences
A dataset of semantically related sentence pairs in the German legal domain