AnacletoLAB / grape

🍇 GRAPE is a Rust/Python Graph Representation Learning library for Predictions and Evaluations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

TransE error: "ValueError: One of the provided node embedding computed with the TransE method contains NaN values."

realmarcin opened this issue · comments

When generating embeddings for KG-Microbe (KGX edge file from KG-Hub) using TransE, the following error was observed:

ValueError Traceback (most recent call last)
in
----> 1 embedding = model.fit_transform(kg)

~/Library/Python/3.7/lib/python/site-packages/cache_decorator/cache.py in wrapped(*args, **kwargs)
595 if not cache_enabled:
596 self.logger.info("The cache is disabled")
--> 597 result = function(*args, **kwargs)
598 self._check_return_type_compatability(result, self.cache_path)
599 return result

~/Library/Python/3.7/lib/python/site-packages/embiggen/utils/abstract_models/abstract_embedding_model.py in fit_transform(self, graph, return_dataframe, verbose)
164 graph=graph,
165 return_dataframe=return_dataframe,
--> 166 verbose=verbose
167 )
168

~/Library/Python/3.7/lib/python/site-packages/embiggen/embedders/ensmallen_embedders/transe.py in _fit_transform(self, graph, return_dataframe, verbose)
112 embedding_method_name=self.model_name(),
113 node_embeddings= node_embedding,
--> 114 edge_type_embeddings= edge_type_embedding,
115 )
116

~/Library/Python/3.7/lib/python/site-packages/embiggen/utils/abstract_models/embedding_result.py in init(self, embedding_method_name, node_embeddings, edge_embeddings, node_type_embeddings, edge_type_embeddings)
76 if np.isnan(numpy_embedding).any():
77 raise ValueError(
---> 78 f"One of the provided {embedding_list_name} "
79 f"computed with the {embedding_method_name} method "
80 "contains NaN values."

ValueError: One of the provided node embedding computed with the TransE method contains NaN values.

I am attaching a jupyter notebook to reproduce the problem.
load_graph_and.ipynb.zip

The input edge file is here: https://kg-hub.berkeleybop.io/kg-microbe/current/kg-microbe.tar.gz

Hello Marcin, in the provided Jupyter you are loading the edge list using:

kg = Graph.from_csv(
    edge_path="./merged-kg_edges.tsv",
   sources_column_number=0,
   edge_list_edge_types_column_number=1,
   destinations_column_number=2,
   directed=False,
   name="kg-microbe")

but this will load the id column as source nodes, since the file is not a triples file like the other one.

Schermata 2022-06-14 alle 20 10 40

If you load the graph from the automatic retrieval (which points to the same edge list) you should not encounter any issue:

from grape.datasets.kghub import KGMicrobe
kg = KGMicrobe()

Nonetheless, it is interesting that this causes this peculiar issue, I will look into it.

Hi @LucaCappelletti94, I ran into the same issue after running the embeddings on my graph - TransE model run after ntriples file loaded. Here is a screenshot of the graph loading and the error.

Screen Shot 2022-06-14 at 7 28 00 PM

ValueError Traceback (most recent call last)
Input In [17], in <cell line: 1>()
----> 1 embedding = model.fit_transform(npkg)

File ~/.conda/envs/faers-embed/lib/python3.8/site-packages/cache_decorator/cache.py:597, in Cache._decorate_function..wrapped(*args, **kwargs)
595 if not cache_enabled:
596 self.logger.info("The cache is disabled")
--> 597 result = function(*args, **kwargs)
598 self._check_return_type_compatability(result, self.cache_path)
599 return result

File ~/.conda/envs/faers-embed/lib/python3.8/site-packages/embiggen/utils/abstract_models/abstract_embedding_model.py:163, in AbstractEmbeddingModel.fit_transform(self, graph, return_dataframe, verbose)
149 if graph.has_disconnected_nodes():
150 warnings.warn(
151 (
152 f"Please be advised that the {graph.get_name()} graph "
(...)
160 )
161 )
--> 163 result = self._fit_transform(
164 graph=graph,
165 return_dataframe=return_dataframe,
166 verbose=verbose
167 )
169 if not isinstance(result, EmbeddingResult):
170 raise NotImplementedError(
171 f"The embedding result produced by the {self.model_name()} method "
172 f"from the library {self.library_name()} implemented in the class "
173 f"called {self.class.name} does not return an Embeddingresult "
174 f"but returns an object of type {type(result)}."
175 )

File ~/.conda/envs/faers-embed/lib/python3.8/site-packages/embiggen/embedders/ensmallen_embedders/transe.py:111, in TransEEnsmallen._fit_transform(self, graph, return_dataframe, verbose)
102 node_embedding = pd.DataFrame(
103 node_embedding,
104 index=graph.get_node_names()
105 )
106 edge_type_embedding = pd.DataFrame(
107 edge_type_embedding,
108 index=graph.get_unique_edge_type_names()
109 )
--> 111 return EmbeddingResult(
112 embedding_method_name=self.model_name(),
113 node_embeddings= node_embedding,
114 edge_type_embeddings= edge_type_embedding,
115 )

File ~/.conda/envs/faers-embed/lib
/python3.8/site-packages/embiggen/utils/abstract_models/embedding_result.py:77, in EmbeddingResult.init(self, embedding_method_name, node_embeddings, edge_embeddings, node_type_embeddings, edge_type_embeddings)
74 numpy_embedding = embedding
76 if np.isnan(numpy_embedding).any():
---> 77 raise ValueError(
78 f"One of the provided {embedding_list_name} "
79 f"computed with the {embedding_method_name} method "
80 "contains NaN values."
81 )
83 self._embedding_method_name = embedding_method_name
84 self._node_embeddings = node_embeddings

ValueError: One of the provided node embedding computed with the TransE method contains NaN values.

Hello @sanyabt! Fortunately, most likely your error is only caused by the fact that the graph is loaded as direct and there may be trap nodes there. Could you try to run kg.get_trap_nodes_number()? If there are any, that is the cause and I have fixed it yesterday (I had forgotten about this corner case).

Resolved also the corner case presented in the other peculiar undirected graph topology.

Thank you! Do we need to update or reinstall grape for the fix?

It will be necessary, but currently, @zommiommy is working on @pnrobinson Printer issue. As soon as that is fixed, we will run the build procedure and deploy the updated version on PyPI. I will notify you here when we do so.

We have added in the READMEs links to the telegram, discord and Twitter accounts to easily reach us.

Deployed updated versions on Pypi, GraPE version 0.1.3.