RathanakSreang / KhmerWordSegmentation

Separate Khmer words from given sentences.

Home Page:https://viblo.asia/p/nlp-khmer-word-segmentation-YWOZrgNNlQ0

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

KhmerWordSegmentation(NLP)

Problem

Unlike other languages, Khmer Word Segmentation is way more complected. Because the Khmer language does not have any standard rule on how we are using space to separate between each word(space are used for easier reading). Moreover, Khmer word can have different meaning with the order of words when it will form. Khmer word could also be a join of two or more Khmer words together.

Because of uncertain rule of spacing and the complicated structure above, which it is hard to segment Khmer Word.

Why we build it?

Ref:

Plan

1.Build web site for:

  • word segmentations: user to input string of sentences and submit then it response with list of words in those sentences.
  • words checking: user submit sentences then it response with sentences and some suggestion word
  • words contribution: allow user input Khmer words with it function(noun, verb,...) then we use it to train our model

About

Separate Khmer words from given sentences.

https://viblo.asia/p/nlp-khmer-word-segmentation-YWOZrgNNlQ0


Languages

Language:Jupyter Notebook 52.2%Language:Python 47.8%