Unknown features

Question

Unknown features

ralovets opened this issue 8 years ago · comments

Unknown features (like new app_id or device_id that was not in training data) lead to random probabilities (too small or too high). Could you suggest a workaround for using LIBFFM in that case?

chapleau · Answer 1 · Thu Dec 22 2016 00:03:40 GMT+0800 (China Standard Time)

(That's not necessarily a bad thing. It'll induce some exploration so that you can sample data for those features the next time you train your model.)
It's random because the latent W_f_j vectors are initialized with random values before the actual training. You could set them to zeros perhaps so that the probability of belonging to the positive class is 0.5, or randomize them each time you predict. And/Or you could include the linear terms in your model and set an appropriate prior for the weights of unseen features.