Unknown features
ralovets opened this issue · comments
Unknown features (like new app_id or device_id that was not in training data) lead to random probabilities (too small or too high). Could you suggest a workaround for using LIBFFM in that case?
(That's not necessarily a bad thing. It'll induce some exploration so that you can sample data for those features the next time you train your model.)
It's random because the latent W_f_j vectors are initialized with random values before the actual training. You could set them to zeros perhaps so that the probability of belonging to the positive class is 0.5, or randomize them each time you predict. And/Or you could include the linear terms in your model and set an appropriate prior for the weights of unseen features.