anetschka

anetschka's starred repositories

shap

A game theoretic approach to explain the output of any machine learning model.

Language:Jupyter NotebookMIT21958 241 2468

NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Language:PythonApache-2.010444 195 2114

LoRA

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Language:PythonMIT9523 64 102

metaseq

Repo for external large-scale work

Language:PythonMIT6415 109 292

deepdoctection

A Repo For Document AI

Language:PythonApache-2.02307 16 169

setfit

Efficient few-shot learning with Sentence Transformers

Language:Jupyter NotebookApache-2.02048 20 291

langdetect

Port of Google's language-detection library to Python.

Language:PythonNOASSERTION1655 26 76

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Language:PythonApache-2.01543 34 229

tensorstore

Library for reading and writing large multi-dimensional arrays.

Language:C++NOASSERTION1309 29 129

pyahocorasick

Python module (C extension and plain python) implementing Aho-Corasick algorithm

Language:CBSD-3-Clause904 22 127

TS-TCC

[IJCAI-21] "Time-Series Representation Learning via Temporal and Contextual Contrasting"

Language:PythonMIT325 4 32

segtok

Segtok v2 is here: https://github.com/fnl/syntok -- A rule-based sentence segmenter (splitter) and a word tokenizer using orthographic features.

Language:PythonMIT168 11 20

ACL-anthology-corpus

This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs

Language:Jupyter Notebook164 7 3

simstring

A Python implementation of the SimString, a simple and efficient algorithm for approximate string matching.

Language:PythonMIT119 5 3

germalemma

A lemmatizer for German language text

Language:PythonApache-2.086 13 4

lafand-mt

MAFAND-MT

Language:Jupyter NotebookGPL-3.050 1 2

MELM

Code for "MELM: Data Augmentation with Masked Entity Language Modeling for Low-Resource NER"

Language:Python43 1 12

TransSHAP

Language:PythonMIT4000

wikipedia2corpus

Wikipedia text corpus for self-supervised NLP model training

Language:PythonMIT35 2 1

UkrainianLT

A collection of links to Ukrainian language tools

CC0-1.026 5 2

german_compound_splitter

Compound splitter for German language ("Komposita-Zerlegung") based on large dictionary combined with highly efficient multi-pattern string search

Language:PythonCC-BY-4.018 2 3

slurk

Slurk (think “slack for mechanical turk”…) is a lightweight and easily extensible chat server built especially for conducting multimodal dialogue experiments or data collections.

Language:PythonBSD-3-Clause15 5 89