CogComp / saul

Saul : Declarative Learning-Based Programming

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

training a classifier should overwrite the .lex

kordjamshidi opened this issue · comments

It seems if the .lex of a classifier has been created before and exists in the default path when we retrain the classifiers it adds features to the same lexicon, that is, the lexicon is not overwritten.
(We need tests for load, save and when classifiers are created from scratch. related to #411 )

@danyaljj do you have any comments on this?

Just to clarify it, are you saying that training a model would write on disk (lexicon file), before/without calling save()?

No, with or without save is not an issue. The issue is when there exists a lex anyhow from the past, the train() just uses that and adds new features to it that leads to exploding the lex size as we run the app and train() frequent times (in different independent runs).

I see. So you think we should always remove lexicon file, at the beginning of train?

I expected it to be overwritten by default, we need to indicate if we want to continue training or need to train from scratch. Because removing those at the beginning of the train will be problematic in case we want to initialize models with existing lex and lc.

Right I agree it's tricky.
We can ask the user at the beginning of the training:

Do you want to remove existing model files? [Y/N]

What do you think?

Sounds good to me. @Rahgooy might have comments.

I think it is good for training a single model, but when we want to train multiple models, let's say with a loop, in that case, the user should wait for the first model to train and then enter [Y/N]. IMO, the better option is to have it as a parameter or something.

In fact for jointraining we have the init parameter: here