PreferredAI / cornac

A Comparative Framework for Multimodal Recommender Systems

Home Page:https://cornac.preferred.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[ASK] Why are the results from models so much lower than actual implementations of the model?

vedantc6 opened this issue · comments

Description

For example, NCF results come out as low as 0.1377 (NDCG@10) | 0.1215 (Precision@10) | 0.1033 (Recall@10) whereas the actual paper shows in 0.40s for NDCG. Even other NCF implementations show the evaluation near 0.40s

Other Comments

Hi,

Thanks for asking.

In the NCF paper, item raking for every user is performed over 100 items only, i.e., 1 held-out test item and 99 randomly sampled negative items, please refer to the evaluation protocols paragraph in the NCF paper. In Cornac, item ranking for a given user is performed over all items she has not interacted with, which is a more reasonable approach, as in practice we do not know which item is positive/negative.

Thank you for a prompt response. Much appreciated.

Additionally, how have you calculated RMSE and MAE for NCF? The actual paper does not, because it is a ranking model not a rating prediction model.

Technically RMSE and MAE can be computed for NCF, as for every user the model predicts scores over all items. However, as you mentioned, one should not consider these measures for evaluating NCF, which is a ranking model.