Zoltan Csaki's starred repositories
language-confusion
Repository for the "Understanding and Mitigating Language Confusion in LLMs" paper
lm-evaluation-harness
A framework for few-shot evaluation of language models.
llm-course
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Chinese-LLaMA-Alpaca
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
sentencepiece
Unsupervised text tokenizer for Neural Network-based text generation.
UD_English-EWT
English data
UD_Thai-PUD
Parallel Universal Dependencies.
UD_Hungarian-Szeged
Hungarian data