gentaiscool

Genta Indra Winata's repositories

end2end-asr-pytorch

End-to-End Automatic Speech Recognition on PyTorch

Language:PythonMIT293 12 40

code-switching-papers

A curated list of research papers and resources on code-switching

Apache-2.0291 24 6

lstm-attention

Attention-based bidirectional LSTM for Classification Task (ICASSP)

Language:Python107 7 2

few-shot-lm

The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)

Language:PythonApache-2.052 50

indonesian-nlp

A curated list of research papers and resources on Indonesian languages

Apache-2.039 60

meta-emb

Multilingual Meta-Embeddings for Named Entity Recognition (RepL4NLP & EMNLP 2019)

Language:Python32 5 1

miners

MINERS ⛏️: The semantic retrieval benchmark for evaluating multilingual language models.

Language:PythonApache-2.09 3 1

matrix_fact

Matrix Factorization Library

Language:PythonBSD-3-Clause7 40

A library to calculate similarity scores between two collections of text sequences encoded using transformer models for bitext mining, dense retrieval, retrieval-based classification, and retrieval-augmented generation (RAG).

Language:PythonApache-2.04 2 1

xnli-dataset

Language:Python1 30

acl-anthology

Data and software for building the ACL Anthology.

Language:PythonApache-2.0020

aclpub2

Language:TeXMIT010

BIG-bench

Beyond the Imitation Game collaborative benchmark for enormous language models

Language:PythonApache-2.0020

calcs2023

Language:Python030

calcs2023_ingest

Language:TeX020

calcs2023_test

Language:Python02 1

DataLab

The unified platform for data-related resources.

Language:PythonApache-2.0020

DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Language:PythonMIT020

do-we-need-attention

Language:TeXMIT010

human-preference-papers

Apache-2.0000

lm-evaluation-harness

A framework for few-shot evaluation of autoregressive language models.

Language:PythonMIT020

mesh-transformer-jax

Model parallel transformers in JAX and Haiku

Language:Jupyter NotebookApache-2.0020

mt-metrics-eval

Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.

Language:PythonApache-2.0000

mteb

MTEB: Massive Text Embedding Benchmark

Language:PythonApache-2.0000

NER-datasets

Datasets to train supervised classifiers for Named-Entity Recognition in different languages (Portuguese, German, Dutch, French, English)

Language:Python020

NL-Augmenter

NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

Language:PythonMIT020

nusa-datasets

Language:Python020

PromptPapers

Must-read papers on prompt-based tuning for pre-trained language models.

020

promptsource

Toolkit for creating, sharing and using natural language prompts.

Language:PythonApache-2.0020

gentaiscool

Genta Indra Winata's repositories

end2end-asr-pytorch

code-switching-papers

lstm-attention

few-shot-lm

indonesian-nlp

meta-emb

miners

gentaiscool.github.io

matrix_fact

distfuse

xnli-dataset

acl-anthology

aclpub2

BIG-bench

calcs2023

calcs2023_ingest

calcs2023_test

DataLab

DeepSpeed

do-we-need-attention

human-preference-papers

lm-evaluation-harness

mesh-transformer-jax

mt-metrics-eval

mteb

NER-datasets

NL-Augmenter

nusa-datasets

PromptPapers

promptsource