huu4ontocord / sungai

Sample multilingual data and tools for creating the data - used for NLP multilingual NLP research

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sungai

Sungai (pronounced soon-nai) means river in Malay and is a sample multilingual dataset. It is meant to be used for NLP multilingual model distillation. "mdd" stands for "multilingual distillation dataset".

About

Sample multilingual data and tools for creating the data - used for NLP multilingual NLP research

License:Apache License 2.0