MrChrisJohnson / implicit-mf

Implicit matrix factorization as outlined in http://yifanhu.net/PUB/cf.pdf.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possible bug in CuI derivation?

bmcfee opened this issue · comments

The per-user/per-item inner loop code does not appear to match the update equations (4) and (5) given in HKV2008.

Specifically, this block: https://github.com/bmcfee/implicit-mf/blob/master/mf.py#L75-L79

CuI = sparse.diags(counts_i, [0])
pu = counts_i.copy()
pu[np.where(pu != 0)] = 1.0
YTCuIY = fixed_vecs.T.dot(CuI).dot(fixed_vecs)
YTCupu = fixed_vecs.T.dot(CuI + eye).dot(sparse.csr_matrix(pu).T)

would match the paper if CuI had the identity subtracted off:

CuI = sparse.diags(counts_i, [0]) - eye

Or am I missing something clever here?

In loading the matrix we only load the alpha counts so CuI (which is terrible variable naming, I'm sorry) is really Cu - I since we never added the + 1s. So YTCuIY should really be Y^T * (Cu - I) * Y and YTCupu should be Y^T * Cu * pu (since we add the + 1s back in with the + eye).

Aha! I was missing something subtle; forgot about the offset in the initial definition of c. Thanks.

Hehe, no problem, glad to have another set of eyes looking at this code :)

Interestingly, when I change + eye to - eye in line 79, my validation metrics improves