anetschka's starred repositories

shap

A game theoretic approach to explain the output of any machine learning model.

Language:Jupyter NotebookLicense:MITStargazers:21958Issues:241Issues:2468

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonLicense:Apache-2.0Stargazers:10444Issues:195Issues:2114

LoRA

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Language:PythonLicense:MITStargazers:9523Issues:64Issues:102

metaseq

Repo for external large-scale work

Language:PythonLicense:MITStargazers:6415Issues:109Issues:292

deepdoctection

A Repo For Document AI

Language:PythonLicense:Apache-2.0Stargazers:2307Issues:16Issues:169

setfit

Efficient few-shot learning with Sentence Transformers

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:2048Issues:20Issues:291

langdetect

Port of Google's language-detection library to Python.

Language:PythonLicense:NOASSERTIONStargazers:1655Issues:26Issues:76

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonLicense:Apache-2.0Stargazers:1543Issues:34Issues:229

tensorstore

Library for reading and writing large multi-dimensional arrays.

Language:C++License:NOASSERTIONStargazers:1309Issues:29Issues:129

pyahocorasick

Python module (C extension and plain python) implementing Aho-Corasick algorithm

Language:CLicense:BSD-3-ClauseStargazers:904Issues:22Issues:127

TS-TCC

[IJCAI-21] "Time-Series Representation Learning via Temporal and Contextual Contrasting"

Language:PythonLicense:MITStargazers:325Issues:4Issues:32

segtok

Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic features.

Language:PythonLicense:MITStargazers:168Issues:11Issues:20

ACL-anthology-corpus

This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs

Language:Jupyter NotebookStargazers:164Issues:7Issues:3

simstring

A Python implementation of the SimString, a simple and efficient algorithm for approximate string matching.

Language:PythonLicense:MITStargazers:119Issues:5Issues:3

germalemma

A lemmatizer for German language text

Language:PythonLicense:Apache-2.0Stargazers:86Issues:13Issues:4

lafand-mt

MAFAND-MT

Language:Jupyter NotebookLicense:GPL-3.0Stargazers:50Issues:1Issues:2

MELM

Code for "MELM: Data Augmentation with Masked Entity Language Modeling for Low-Resource NER"

Language:PythonLicense:MITStargazers:40Issues:0Issues:0

wikipedia2corpus

Wikipedia text corpus for self-supervised NLP model training

Language:PythonLicense:MITStargazers:35Issues:2Issues:1

UkrainianLT

A collection of links to Ukrainian language tools

german_compound_splitter

Compound splitter for German language ("Komposita-Zerlegung") based on large dictionary combined with highly efficient multi-pattern string search

Language:PythonLicense:CC-BY-4.0Stargazers:18Issues:2Issues:3

slurk

Slurk (think “slack for mechanical turk”…) is a lightweight and easily extensible chat server built especially for conducting multimodal dialogue experiments or data collections.

Language:PythonLicense:BSD-3-ClauseStargazers:15Issues:5Issues:89

rollinglda

A rolling version of the Latent Dirichlet Allocation.

Language:RLicense:GPL-3.0Stargazers:11Issues:2Issues:3

GerDaLIR

German Dataset for Legal Information Retrieval

License:MITStargazers:11Issues:6Issues:0

keyword-selection

The implementation of "Domain Representative Keywords Selection: A Probabilistic Approach" (Findings of ACL '22)

Language:C++Stargazers:10Issues:0Issues:0

german_legal_sentences

A dataset of semantically related sentence pairs in the German legal domain

Stargazers:9Issues:0Issues:0

corpus

German T5 Training corpus

Language:Jupyter NotebookStargazers:3Issues:8Issues:2