ycjuan / libffm

A Library for Field-aware Factorization Machines

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unknown features

ralovets opened this issue · comments

Unknown features (like new app_id or device_id that was not in training data) lead to random probabilities (too small or too high). Could you suggest a workaround for using LIBFFM in that case?

(That's not necessarily a bad thing. It'll induce some exploration so that you can sample data for those features the next time you train your model.)
It's random because the latent W_f_j vectors are initialized with random values before the actual training. You could set them to zeros perhaps so that the probability of belonging to the positive class is 0.5, or randomize them each time you predict. And/Or you could include the linear terms in your model and set an appropriate prior for the weights of unseen features.