why does get_all_node_embedding return a list of embedding matrix?

Question

why does get_all_node_embedding return a list of embedding matrix?

chansigit opened this issue a year ago · comments

Sijie Chen commented a year ago

Luca Cappelletti · Answer 1 · Fri Jun 02 2023 18:00:12 GMT+0800 (China Standard Time)

Could you:

Provide the code you are using
Mention the embedding method you are using
Describe what is the behaviour you expect, and why the behaviour you see is unexpected.

Sijie Chen · Answer 2 · Fri Jun 02 2023 18:01:59 GMT+0800 (China Standard Time)

I am working a graph g_dtw:

embedding = Node2VecGloVeEnsmallen(embedding_size=50, walk_length=10, max_neighbours=20).fit_transform(g_dtw)

and try to get the embedding results:

z1 = embedding.get_all_node_embedding()

only to find the returned z1 is a list containing 2 embedding matrices. Why?

Luca Cappelletti · Answer 3 · Fri Jun 02 2023 18:09:10 GMT+0800 (China Standard Time)

I am not sure what other thing you would expect - let me break it down for you.

Node2Vec is a node embedding approach based on sampling random walks to be used as input for NLP models, such as Word2Vec CBOW and SkipGram, or GloVe, such as the option you choose.
All three models, CBOW, SkipGram and GloVe, are characterized by TWO word embedding. These node embeddings have different interpretations depending on the selected embedding, which I may characterize as follows:
- In CBOW, the first embedding is the 'context representation' of a node, while the second embedding is the 'central representation' of a node. The model learns to bind the two embeddings in such a way the dot product between true contextual nodes and central nodes is maximal.
- In SkipGram, the result is somewhat similar but inverted.
- In GloVe, you have a first node embedding that is interpretable as the source node embedding, with the latter being the destination node embedding. This is because the model tunes the dot product of the source and destination node embedding in such a way as to estimate the co-occurrence of the two nodes in random walks.

I hope this clarifies to you why these models have two distinct node embedding with different characteristics.

If you are referring to the choice of several libraries to not make these or several other features available, you should go ask them.

Sijie Chen · Answer 4 · Fri Jun 02 2023 18:10:39 GMT+0800 (China Standard Time)

Thank you so much for your teaching!

I thought only one embedding would be returned.

Luca Cappelletti · Answer 5 · Fri Jun 02 2023 18:12:14 GMT+0800 (China Standard Time)

No worries, I understand that there is some confusion regarding these topics. Could you take a minute to describe to me how the library experience could be improved to help you more intuitively understand what was happening?

Sijie Chen · Answer 6 · Sat Jun 03 2023 01:17:23 GMT+0800 (China Standard Time)

docstring!
I tried to understand the output by myself but this function does not seem to have a docstring. Descriptions about returned list should clarify the function's design.

Sijie Chen · Answer 7 · Sat Jun 03 2023 01:19:03 GMT+0800 (China Standard Time)

you are so devoted to grape. I am glad to introduce your work to my colleagues.