Importing torch_cluster in jupyter lab kills the kernel and forces kernel restart

Question

Importing torch_cluster in jupyter lab kills the kernel and forces kernel restart

poros-mnemosyne opened this issue a year ago · comments

I just recently updated all of my packages to try and get the torch_geometric DataLoader to work as intended, and immediately after doing so was no longer able to import torch_cluster. When attempting to load the package, the kernel dies and the kernel restart dialogue appears, and the kernel restarts.

I don't get an error message in lab, and I am not sure where any logs are located, but I am happy to provide additional pertinent info.

Running Python 3.10.9
Pytorch 1.12.1
PyGeo 2.1.0.post1
torch_cluster 1.6.0

Reproducible code:

import torch_cluster as tc

Edit: It seems Python 3.10.9 was released today, so maybe that is the culprit since I updated everything in my conda env? PyTorch can be imported as well, but it seems PyG cannot be imported without the same result.

poros-mnemosyne · Answer 1 · Tue Jan 24 2023 06:31:03 GMT+0800 (China Standard Time)

I reverted my environment to a prior revision from November, so I am at least back to where I was before practically, but I am still curious as to what the issue was, or if it's related to current issues with Pytorch Geometric, or the new Python release.

Happy to replicate for further troubleshooting if need be, but for now I will be looking into workarounds for the dataloader.

Matthias Fey · Answer 2 · Tue Jan 24 2023 14:14:48 GMT+0800 (China Standard Time)

How did you install torch-cluster in the new environment? Usually, a segfault indicates that it failed to load the C++/CUDA libs related to torch-cluster. Do torch-scatter and torch-sparse work normally for you?

poros-mnemosyne · Answer 3 · Wed Jan 25 2023 01:50:06 GMT+0800 (China Standard Time)

I re-created the error-causing environment using conda create ... --clone ... just a few moments ago, and running conda update --all, and the issue seems to have disappeared. I'm guessing either the good old "turn it on and off again" or if some other package update in the last 18 hours somewhere must have done it? That or other changes made during my process of restoring the initial environment, undoing some prior incompetence or missing package/setup that resulted in the error.

I'm using a CPU-only setup, so that may be the issue (SSH to Debian Linux server with no GPU) if I did not have it properly set up, maybe the issue was an attempt to load incompatible CUDA libs?

I'll close this issue, and update the main env and confirm the issue does not persist

Edit: Updating the main env (conda update --all) and then running import torch_cluster as tc works without issue as did the test env. The extra good news is the data loader issue was also resolved by updating, as I had hoped it would be!