U096883L Shawn Tan
Included source files:
-
Tagger.java
: Main part of the program. Contains code for:- Learning transition probabilities from a given text file
- Viterbi algorithm for inferring the POS tags for a string
- Smoothing method is modular, and implemented by extending (and then adding) the
Smoother
inner class.
-
build_tagger.java
: Implements codes that instantiatesTagger
and initiates learning. On runningjava build_tagger train_file test_file model_file
, the following occurs:- Instantiates
Tagger
and initialises learning usingtrain_file
. - Model is written to
model_file
as a serialised object. - Proceeds to evaluate learnt model using
test_file
.tag_file
has to be a labelled file in the same format astrain_file
- Outputs confusion matrix for the test: a (No. of POS tags) X (No. of POS tags) grid.
- Outputs Recall, Precision and F1-measures per tag.
- Instantiates
-
run_tagger.java
: Usagejava run_tagger test_file model_file out_file
On running, unserializes model frommodel_file
and tags sentences intest_file
before outputting them onout_file