Khuyagbaatar Batsuren's repositories
Language:Shell000
baseline-pretraining
Code for pre-training BabyLM baseline models.
Language:Python000
cramming
Cramming the training of a (BERT-type) language model into limited compute.
Language:PythonMIT000
evaluation-pipeline
Evaluation pipeline for the BabyLM Challenge 2023.
Language:PythonMIT000
subword-nmt
Unsupervised Word Segmentation for Neural Machine Translation and Text Generation
Language:PythonMIT000
ud-compatibility
marry.py: A utility for converting Universal Dependencies–annotated corpora to UniMorph
Language:PythonGPL-3.0000
Language:Python000
Language:Python000
tokenizers
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Language:RustApache-2.0000
Language:PythonApache-2.0000
000