OpenPecha / pybo

🦜 NLP for Tibetan, in Python.

Home Page:https://esukhia.github.io/pybo/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

User Stories

ngawangtrinley opened this issue · comments

User Stories

# As a... I want to... so that...
1. researcher segment a collection of Tibetan texts I can do statistics in AntConc
2. tibetan text proofreader mark potential errors I can catch and correct more mistakes
3. corpus researcher for amdo dialects create several custom profiles I can do statistics on different spoken dialects
4. corpus researcher on literary Tibetan create a custom profile for the kangyur I can do accurate statistics on the kangyur and tengyur
5.

Rule based segmentation steps (for story 3 & 4)

  1. Segment a volume with the default profile
  2. Create a word list from the volume, ordered by frequency
  3. Manually cleanup the wordlist
  4. Use the wordlist as the main list
  5. Segment the volume again
  6. Edit the custom profile (word /remove /adjustments) till the segmentation is good
  7. Merge custom profile with main profile
  8. Repeat with a second volume

Steps for story # & #