How to apply NCF to datasets that only have the number of interactions?

Question

How to apply NCF to datasets that only have the number of interactions?

sgaseretto opened this issue 6 years ago · comments

Sebastian Gonzalez Aseretto commented 6 years ago

As the question states, how could this be applied to a dataset that only has the number of interactions between an user and the item? Movielens has the ratings in the, which is explicit feedback, but how could this model be applied to a dataset like the audioscrobbler dataset which has as implicit-feedback the number of times a user heard an artist? Here is an example of recommendations implementing ALS and using that dataset: http://www.gousios.gr/courses/bigdata/audioscrobbler.html

Yihong Chen · Answer 1 · Mon Nov 26 2018 11:43:22 GMT+0800 (China Standard Time)

I think NCF addresses implicit feedback as well. For dataset only containing interactions between users and items, you can try to use BPR(Bayesian Personalized Ranking) criterion as the loss function. Specifically, the existing interactions are positive samples while the negative ones are sampled manually.

Qrh · Answer 2 · Fri Mar 22 2019 14:53:11 GMT+0800 (China Standard Time)

According to the origin paper, NCF is proposed to deal with implicit feedback. May I ask why this repo used normalization to process ratings?

Yihong Chen · Answer 3 · Fri Mar 22 2019 17:14:40 GMT+0800 (China Standard Time)

Negative items get 0 ratings in this repo. And I normalized the ratings into [0, 1]. I think it is fine if you do not normalize the rating. But it might be hard to tune the hyper-params then.

Qrh · Answer 4 · Fri Mar 22 2019 18:17:43 GMT+0800 (China Standard Time)

Negative items get 0 ratings in this repo. And I normalized the ratings into [0, 1]. I think it is fine if you do not normalize the rating. But it might be hard to tune the hyper-params then.

I take a look at the implementation by the authors and found the function here. They just set all negative items to 0 and all other items interacted with users to 1 rather normalizing them (implicit feedback V.S. explicit feedback).

Yihong Chen · Answer 5 · Fri Mar 22 2019 22:30:52 GMT+0800 (China Standard Time)

@RuihongQiu Thank you for reporting the bug. I added support for implicit feedback in the latest commit. Could you check if it works well? I only tested the GMF .

Qrh · Answer 6 · Sat Mar 23 2019 10:02:12 GMT+0800 (China Standard Time)

@LaceyChen17 I will check it out soon.

Thanks a lot! I think the new code works.

I have checked all the experiments with new rating settings.

The first two experiments are actually explicit feedback with normalization on ratings.
Filenames ended with "implicit" are the result of the newest commit.

I also implement a new binarize method which just works as how "_normalize" works. It avoids the many lines change of codes compared to the newest version.

The results of binarize methods are filenames ended with "binarize".
If you would'n mind, I can pull a request.

Yihong Chen · Answer 7 · Sun Mar 24 2019 21:00:25 GMT+0800 (China Standard Time)

Pull requests are extremely welcome!!! BTW Could you also share your training curves by updating README.md ?

Qrh · Answer 8 · Sun Mar 24 2019 21:11:21 GMT+0800 (China Standard Time)

I've opened a pull request for update.

Yihong Chen · Answer 9 · Mon Mar 25 2019 12:29:00 GMT+0800 (China Standard Time)

Merged. Thank you so much!