pswot / wordlists

Wordlists dictionary for Burmese (Myanmar)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kanaung-Wordlists

Wordlists dictionary for Burmese (Myanmar)

Under construction

We have built Burmese wordlists from Myanmar Letter Ka (U+1000) "က" to Myanmar Letter A (U+1021) "အ". Currently some words are not in order and duplicate words occur. We will fix these errors after completing "Burmese Sorting". Don't hesitate if you want to help with it.

Sources

Currently,all these words were taken from "Burmese Spelling Book", officially published in 2003 by Myanmar Department of Education Ministry. "မြန်မာစာလုံးပေါင်းသတ်ပုံကျမ်း(ဒုတိယနှိပ်ခြင်း ၊၂၀၀၃-ခုနှစ်၊ ဇွန်လ)" . We got a PDF file and detected it was encoded in standardized Unicode 5.1 or later.

Modifications

  1. As usual, PDF extraction cannot correctly detect text alignments, so some words are not in order, and ending-letters, such as Asat (U+103A) "်", Lower Vowel (U+1030) "ု" are missing and we had to add manually these letters.

  2. We consider the final lists to be clean and simple for other programming and research uses. This is why we removed all annotations explaining the correct usage of the words

Purposes

  1. For dictionary writers, these wordlists will be a useful source.

  2. For NLP(Natural Language Processing) researchers, it may be essential in several NLP works utilizing dictionary-lookup approach, such as POS-tagging, building N-grams, Myanmar-English bilingual corpora, applications in Myanmar OCR, etc.

Future

  1. We'll update the lists with new words

  2. Burmese sorting and related tools will be developed for several platforms.

About

Wordlists dictionary for Burmese (Myanmar)

License:Do What The F*ck You Want To Public License


Languages

Language:Java 61.5%Language:PHP 38.5%