Add `NodeEmbeddings` abstract type
eriknw opened this issue · comments
This is like a NodeMap
where the value is a Vector
. Just as a NodeMap
can be converted to a Vector
, a NodeEmbedding
can be converted to a Matrix
.
I'm unsure about the name. NodesEmbedding
? NodeEmbeddings
? NodeMapToVectors
? NodeMapOfVectors
? NodeToVectors
?
I'm going to bring up some thoughts and questions I had about the Embedding
type given what we discussed in our last meeting.
Embedding
Implementation
In our last meeting, we talked about how it's useful to separate out the training and inference phases of the embedding algorithm. Here was my thought on how we could accomplish this in an Embedding
abstract type:
- It should have a required
__call__
method. - It should have a required
input_type
property. - It should have a required
return_type
property. This should be a Matrix concrete type. - The
__call__
method would take a tuple of inputs, which can be graphs, nodes, tuples (tuples are useful sincegraph_sage
takes a graph+node and returns an embedding representing that node’s embedding within the graph), edges, etc. and return a matrix of sizeinput_count x embedding_size
. - Example:
embedding = res.algos.embedding.graph_sage(..., embedding_size=500, ...)
matrix = embedding( ((graph_1, node_1), ..., (graph_N, node_N)) ) # matrix has shape N x 500
- Example:
embedding = res.algos.embedding.node2vec(..., embedding_size=200, ...)
matrix = embedding( (node_1, ..., node_N) ) # matrix has shape N x 200
This would sufficiently separate the training and inference phases of the embedding. The training happens when the embedding algorithm is called, but the inference happens when the returned Embedding
's __call__
method is used.
What are your thoughts on this proposal?
Further Thoughts
It may be the case that the input_type
and return_type
properties aren't strictly necessary, but they seem useful for validating inputs. Perhaps we can go without them. If we do, does it make sense to have an embedding type in metagraph
? Would simply returning a callable be sufficient?
Perhaps they can be useful if we want to embed graphs, nodes, etc. of a different type. The resolver's translator would usefully use these types. This might require us to have a "perform embedding inference" algorithm to be called in the resolver instead of simply using a __call__
method. Does this route sound reasonable? It would certainly make performing the inference require more verbose code.
Example:
embedding = res.algos.embedding.node2vec(..., embedding_size=200, ...) matrix = res.algos.embedding.apply_embedding(embedding, (node_1, ..., node_N) ) # matrix has shape N x 200
Embedding
Translation
When it comes to the embedding abstract/concrete type, it’s not exactly clear what it means to translate from one to another since the embedding will be a callable. Should we forbid translations on embedding?
Motivation Behind Embedding
Type
Regarding whether or not an embedding type is motivated, it seems useful to me because we could run the embedding algorithm on the GPU, get our callable that’ll return a GPU matrix, use metagraph
to translate the GPU matrix to PUMA, and then do other stuff with those matrices on PUMA. Even though we don’t have many algorithms taking in matrices right now, it still seems useful for the expert users who will take these matrices and do something in some other library (e.g. PyTorch, TensorFlow, etc.) with them.
Is this sufficiently motivating? Are there any concerns I did not mention?