Khuyagbaatar Batsuren (kbatsuren)

kbatsuren

Geek Repo

Company:National University of Mongolia

Twitter:@khuyagbaatar_b

Github PK Tool:Github PK Tool

Khuyagbaatar Batsuren's repositories

CogNet

CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates

MorphyNet

MorphyNet: a Large Multilingual Database of Derivational and Inflectional Morphology (+morpheme segmentation)

wiktra

Wiktra - Python tool of Wiktionary Transliteration modules for 514 languages and its 102 different scripts (orthographies)

Language:LuaLicense:GPL-2.0Stargazers:25Issues:5Issues:8

monwn

The Mongolian Wordnet (MonWN)

Language:TeXLicense:NOASSERTIONStargazers:15Issues:5Issues:2

KinDiv

Lexical gap database in kinship domain

UniMet

Metonymy corpus of 26 thousand instances in 189 languages across 24 metonymy patterns

Language:ShellStargazers:0Issues:0Issues:0

baseline-pretraining

Code for pre-training BabyLM baseline models.

Language:PythonStargazers:0Issues:0Issues:0

cramming

Cramming the training of a (BERT-type) language model into limited compute.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

evaluation-pipeline

Evaluation pipeline for the BabyLM Challenge 2023.

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

subword-nmt

Unsupervised Word Segmentation for Neural Machine Translation and Text Generation

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

ud-compatibility

marry.py: A utility for converting Universal Dependencies–annotated corpora to UniMorph

Language:PythonLicense:GPL-3.0Stargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0
Language:PythonStargazers:0Issues:0Issues:0

tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Language:RustLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0