NeoFantom / nlphomework

NLP homework at Shanxi University which include Chinese word segmentation and Chinese word Part-Of-Speech(POS) Tagging.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

nlphomework

This is an NLP practice which include Chinese word segmentation and Chinese word Part-Of-Speech(POS) Tagging.

Word segmentation algorithm: Max-length-match.

POS tagging algorithm: HMM with Viterbi algorithm.

Use IntelliJ instead of Eclipse

I used IntelliJ to write this project, so IntelliJ is highly recommended rather than Eclipse, for if you use IntelliJ, everything's all set, while for eclipse you have to do something manually.

How to run

Before you run it, check the class Constants. You should manually create the directories specified in Constants. Then it'll be all set. There are three classes each of which contains a main function. Those in SegmentationTask and PosTaggingTask runs only one task, respectively. The main function in ChineseNaturalLanguageProcess runs two tasks all at once.

How to change experimental parameters

All changeable constants about this experiment are specified in Constants. Feel free to modify Constants and the three main functions to your own need for they are easy to understand and modify.

Also, if you have some fundamental knowledge about Java and NLP, this repo should be very easy for you to understand. Key parts are well-commented.

License

Copyright Neo

Licensed under the No License, Version 3.1415926. Anyone who uses this repo and recognize it as a good work, must give me a star. You may use this repo to whatever need you want, without notifying anybody on exception of this: If you are using this repo to finish your homework, I sincerely hope you to tell your teacher it's based on a GitHub project rather than saying it's your own work.

Full articles of this License is available at:

http://no.license.site/OK_I'm_kidding

Unless required by applicable law, kati doku blabla. Let's just skip this part and begin coding.

About

NLP homework at Shanxi University which include Chinese word segmentation and Chinese word Part-Of-Speech(POS) Tagging.


Languages

Language:Java 99.4%Language:Python 0.6%