Genta Indra Winata's starred repositories
machine_learning_complete
A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.
nusa-crowd
A collaborative project to collect datasets in Indonesian languages.
Open-Instruction-Generalist
Open Instruction Generalist is an assistant trained on massive synthetic instructions to perform many millions of tasks
ACL-anthology-corpus
This repository provides details and links to the ACL anthology corpus/collection including .bib, .pdf and grobid extractions of the pdfs
minilmv2.bb
Our open source implementation of MiniLMv2 (https://aclanthology.org/2021.findings-acl.188)
indonesian-nlp
A curated list of research papers and resources on Indonesian languages
kbir_keybart
Experimental code used in pre-training the KBIR and KeyBART models
nusa-writes
NusaWrites is an in-depth analysis of corpora collection strategy and a comprehensive language modeling benchmark for underrepresented and extremely low-resource Indonesian local languages.
english-speaker-friendly-korean-companies
Repository to aggregate data about Korean companies that works with English as official language or accepts non-Korean speaking members
KnowExpert
The implementation of the paper "Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters".
LLM-Code-Mixing
Can LLMs generate code-mixed sentences through zero-shot prompting?
code-mixed-lid
Word-level language identification for Bangla-English code-mixed social media data, using a BiLSTM with subword embeddings.
nusa-catalogue
Dataset Catalogue Homepage for Indonesian Languages
globalbench
GlobalBench: A Benchmark for Global Progress in Language Technology
Weakly-Supervised-Multitask-MAR
Weakly-supervised Multitask Multimodal Affect Recognition.