RingBDStack / SUGAR

Code for "SUGAR: Subgraph Neural Network with Reinforcement Pooling and Self-Supervised Mutual Information Mechanism"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Gradient explosion and memory error in DD,NCI1,NCI109

QiyaoHuang opened this issue · comments

  1. In DD:
    Traceback (most recent call last):
    File "transform.py", line 443, in
    adjs, d_es = paser.main(save=True)
    File "transform.py", line 153, in main
    d_es, adj_com = self.ex_edges()
    File "transform.py", line 87, in ex_edges
    adj = np.zeros((self.n, self.n))
    numpy.core._exceptions.MemoryError: Unable to allocate 112. GiB for an array with shape (122494, 122494) and data type float64
    2.In NCI1 and NCI109:
    folds 1/10: 0%| | 1/1000 [04:11<41:05:52, 148.10s/it, k:0.80, loss: nan, best_acc:1.00, RL:0]
    folds 1/10: 0%| | 2/1000 [04:11<33:48:52, 121.98s/it, k:0.80, loss: nan, best_acc:1.00, RL:0]
    folds 1/10: 0%| | 2/1000 [05:52<33:48:52, 121.98s/it, k:0.80, loss: nan, best_acc:1.00, RL:0]
    folds 1/10: 0%| | 2/1000 [05:52<33:48:52, 121.98s/it, k:0.80, loss: nan, best_acc:1.00, RL:0]
    folds 1/10: 0%| | 3/1000 [05:52<31:02:52, 112.11s/it, k:0.80, loss: nan, best_acc:1.00, RL:0]
  1. It did consumed high memory, thank you for the warning, we are about to pakage the binary DD output and release here.

  2. High learning rate, version of env, bad init random value or some other reason may cause this. I tried NCI1 and NCI109 just now, work normlly in my machine.
    Maybe you can try other parameter setting, adding middle output, using gradient clipping.

https://pan.baidu.com/s/1rdYhypHCebxBknMrIzAubg password: eqlo
Note that the feature and the subgraph has been split into the folder "features" and "subadj" to fit your memory problem, you can just change the code to run (you don't need load them all once, load specific graph when you need in train progress)