Gradient explosion and memory error in DD,NCI1,NCI109

Question

Gradient explosion and memory error in DD,NCI1,NCI109

QiyaoHuang opened this issue 3 years ago · comments

Qiyao Huang commented 3 years ago

In DD:
Traceback (most recent call last):
File "transform.py", line 443, in
adjs, d_es = paser.main(save=True)
File "transform.py", line 153, in main
d_es, adj_com = self.ex_edges()
File "transform.py", line 87, in ex_edges
adj = np.zeros((self.n, self.n))
numpy.core._exceptions.MemoryError: Unable to allocate 112. GiB for an array with shape (122494, 122494) and data type float64
2.In NCI1 and NCI109:
folds 1/10: 0%| | 1/1000 [04:11<41:05:52, 148.10s/it, k:0.80, loss: nan, best_acc:1.00, RL:0]
folds 1/10: 0%| | 2/1000 [04:11<33:48:52, 121.98s/it, k:0.80, loss: nan, best_acc:1.00, RL:0]
folds 1/10: 0%| | 2/1000 [05:52<33:48:52, 121.98s/it, k:0.80, loss: nan, best_acc:1.00, RL:0]
folds 1/10: 0%| | 2/1000 [05:52<33:48:52, 121.98s/it, k:0.80, loss: nan, best_acc:1.00, RL:0]
folds 1/10: 0%| | 3/1000 [05:52<31:02:52, 112.11s/it, k:0.80, loss: nan, best_acc:1.00, RL:0]

Yang Beining · Answer 1 · Wed Nov 17 2021 19:24:42 GMT+0800 (China Standard Time)

It did consumed high memory, thank you for the warning, we are about to pakage the binary DD output and release here.
High learning rate, version of env, bad init random value or some other reason may cause this. I tried NCI1 and NCI109 just now, work normlly in my machine.
Maybe you can try other parameter setting, adding middle output, using gradient clipping.

Yang Beining · Answer 2 · Wed Nov 17 2021 22:53:04 GMT+0800 (China Standard Time)

https://pan.baidu.com/s/1rdYhypHCebxBknMrIzAubg password: eqlo
Note that the feature and the subgraph has been split into the folder "features" and "subadj" to fit your memory problem, you can just change the code to run (you don't need load them all once, load specific graph when you need in train progress)