Parts of Speech Tagger Using Hidden Markov Model and Viterbi decoding algorithm.
Hidden Markov Model part-of-speech tagger for Catalan. The training data is provided tokenized and tagged; the test data will be provided tokenized, and the tagger will add the tags.
The Training data contains the data in the following format. A file with tagged training data in the word/TAG format, with words separated by spaces and each sentence on a new line. A file with untagged development data, with words separated by spaces and each sentence on a new line. A file with tagged development data in the word/TAG format, with words separated by spaces and each sentence on a new line, to serve as an answer key.