Add a Benchmark for Asian Languages
KennethEnevoldsen opened this issue · comments
Kenneth Enevoldsen commented
Linguistic Families and Proposed Languages:
East Asian Languages
- Chinese (Mandarin) - cmn
- Cantonese - yue (#370)
- Japanese - jpn
- Korean - kor
- Mongolian - mon
South Asian Languages
Indic Languages:
-
Hindi - hin
-
Bengali - ben
-
Punjabi - pan
-
Marathi - mar
-
Gujarati - guj
-
Urdu - urd
-
Nepali - nep
-
Sinhala - sin
-
Tamil - tam
-
Telugu - tel
-
Kannada - kan
-
Malayalam - mal
-
Dravidian Languages:
- Included above (Tamil, Telugu, Kannada, Malayalam)
Southeast Asian Languages
- Austronesian Languages:
- Indonesian - ind
- Filipino - fil (#472 )
- Malay - msa
- Javanese - jav
- Tai-Kadai Languages:
- Thai - tha
- Lao - lao
- Austroasiatic Languages:
- Vietnamese - vie (see #364)
- Khmer - khm
- Burmese - mya
Central Asian Languages
- Turkic Languages:
- Kazakh - kaz
- Uzbek - uzb
- Turkmen - tkm
- Kyrgyz - kir
- Uighur - uig
West Asian (Middle Eastern) Languages
- Semitic Languages:
- Arabic - ara
- Hebrew - heb
- Iranian Languages:
- Persian - fas
- Kurdish - kur
- Pashto - pus
- Dari - prs
Note this list does not claim to be comprehensive, do feel free to add to the list.
rasdani commented
I will take a stab at a Bengali benchmark together with a colleague of mine 👍
Kenneth Enevoldsen commented
Wonderful @rasdani feel free to create an issue on this as well so that others can see that you are working on it.