PageRank vs your formulation of PageRank

Question

PageRank vs your formulation of PageRank

mdanb opened this issue 3 years ago · comments

Hi,

Thank you for this interesting work! I just have a few clarification questions about the formulation of the (original) PageRank that you presented in the paper. You mention that PageRank can be calculated via r = Ar, but from what I understand from PageRank, this isn't the complete formulation of PageRank. PageRank also includes terms that take into account what they call "rank sinks." I'm wondering why this is missing in your equations.

Thank you!

Johannes Gasteiger, né Klicpera · Answer 1 · Tue Apr 06 2021 19:52:49 GMT+0800 (China Standard Time)

Indeed, regular PageRank includes a globally uniform teleport vector to counteract so-called spider traps (or rank sinks). However, this is only needed for directed graphs. In this work we only consider undirected graphs. Furthermore, PPNP is based on personalized PageRank (PPR), for which the teleport vector uses only the root node. This already counteracts spider traps and thus no global teleport is needed.

Does this answer your question?

mdanb · Answer 2 · Wed Apr 07 2021 02:36:57 GMT+0800 (China Standard Time)

Yes thank you. As a follow up question, I was wondering if you could elaborate on the following paragraph from your paper. I now understand PPNP fully, but I don't quite understand how you go from PPNP to APPNP, which you describe here:

Johannes Gasteiger, né Klicpera · Answer 3 · Wed Apr 07 2021 05:30:12 GMT+0800 (China Standard Time)

APPNP leverages that we don't need one PPR vector for every node, but only one PPR vector per class. This is just a matter of interpretation, though. The main difference between PPNP and APPNP is that APPNP doesn't exactly calculate the PPR vectors, but instead approximates them via power iterations.

You can actually go even further with this PPR approximation, which then leads to the massively scalable PPRGo model: https://arxiv.org/abs/2007.01570