AlexPoint / OpenNlp

Open source NLP tools (sentence splitter, tokenizer, chunker, coref, NER, parse trees, etc.) in C#

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to generate a Tag Dictionnary?

NeomMob opened this issue · comments

I am using the following code for training a POS model. The question is then how to generate the tag dictionnary that is required later to use the model?

        var trainingFile = "..";
        // The number of iterations; no general rule for finding the best value, just try several!
        var iterations = 5;
        // The cut; no general rule for finding the best value, just try several!
        var cut = 2;
        // Train the model (can take some time depending on your training file size)
        var model = MaximumEntropyPosTagger.TrainModel(trainingFile, iterations, cut); 
        // Persist the model to use it later
        var outputFilePath = @"...";
        new BinaryGisModelWriter().Persist(model, outputFilePath);
commented

When you create a new object MaximumEntropyPosTagger, you can pass as an argument a PosLookupList which is your tag dictionary.
If you don't, it defaults to DefaultPosContextGenerator.
Now this tag dictionary and the GisModel are two distinct objects so you need to persist them both in different files if you don't use the default one. Does that answer your question?

Not tried yet but it seems to answer to all of my questions. Thanks for your support!