Kernel Restart - Incompatibility between nx.draw and utils_tf.data_dicts_to_graphs_tuple

Question

Kernel Restart - Incompatibility between nx.draw and utils_tf.data_dicts_to_graphs_tuple

mshearer0 opened this issue 4 years ago · comments

Hi.

I'm trying to use nx.draw and utils_tf.data_dicts_to_graphs_tuple in the same TF2 notebook.

Whichever is executed second seems to cause a kernel restart in the notebook which i can't explain. Importing networkx is fine as long as nx.draw is not run.

@Mistobaan - I get this behaviour on your very helpful TF2 version of graph_nets_basic tutorial.

Michael.

Alvaro · Answer 1 · Thu Aug 13 2020 01:24:05 GMT+0800 (China Standard Time)

I have not observed this, not sure if @Mistobaan did.

Are you running on your own kernel, or on Google Colaboratory?

Fabrizio Milo · Answer 2 · Thu Aug 13 2020 02:52:17 GMT+0800 (China Standard Time)

In my experience that is usually an out of memory case. Check the system logs if you are running on Colab.

Michael Shearer · Answer 3 · Thu Aug 13 2020 05:17:13 GMT+0800 (China Standard Time)

Hi, thanks. I'm running on GCP Notebook with 15GB RAM. GCP logs show:

Aug 12 21:12:02 ... bash[1278]: OMP: Error #15: Initializing libiomp5.so, but found libomp.so already initialized.
Aug 12 21:12:02 ... bash[1278]: OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the progr
am. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is
linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can
set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorr
ect results. For more information, please see http://www.intel.com/software/products/support/.
Aug 12 21:12:03 ... bash[1278]: [I 21:12:03.530 LabApp] KernelRestarter: restarting kernel (1/5), keep random ports
Aug 12 21:12:03 ... bash[1278]: kernel ... restarted

Fabrizio Milo · Answer 4 · Tue Aug 18 2020 06:41:44 GMT+0800 (China Standard Time)

I think the answer is printed by your logs:

set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results

Michael Shearer · Answer 5 · Tue Aug 18 2020 14:49:18 GMT+0800 (China Standard Time)

@Mistobaan - yes, I’ve used that as a workaround but wondered if there was a better option?

Fabrizio Milo · Answer 6 · Wed Aug 19 2020 09:04:11 GMT+0800 (China Standard Time)

Get a bigger machine with more memory? Can you replicate the problem into a colab and post the link to the colab? make sure you set the share permissions.

Michael Shearer · Answer 7 · Sat Aug 22 2020 02:38:57 GMT+0800 (China Standard Time)

Upgrading to GCP Notebook Tensorflow 2.3 (from 2.2.0) resolved the issue.