Hanjun-Dai / GLN

Implementation of Retrosynthesis Prediction with Conditional Graph Logic Network

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

segment fault

fengjiaxin opened this issue · comments

hi,excuse me
i meet a new issue,when i train the model
i meet another issue
segment fault core dump
would you update the new code,i have no idea to solve the problem

and more:
i think GLN/gln/mods/mol_gnn/gnn_family/utils.py can update by replace cuda() to to(DEVICE)
thanks a lot

could you please provide more details for the segfault?

./run_mf.sh: 行 60: 9301 段错误 (吐核)python ../main.py -gm $gm -fp_degree 2 -neg_sample $neg_sample -att_type $att_type -gnn_out $gnn_out -tpl_enc $tpl_enc -subg_enc $subg_enc -latent_dim $msg_dim -bn $bn -gen_method $gen -retro_during_train $retro -neg_num $neg_size -embed_dim $embed_dim -readout_agg_type $graph_agg -act_func $act -act_last True -max_lv $lv -dropbox $dropbox -data_name $data_name -save_dir $save_dir -tpl_name $tpl_name -f_atoms $dropbox/cooked_$data_name/atom_list.txt -iters_per_val 3000 -gpu 1 -topk 50 -beam_size 50 -num_parts 1

no other information, i think its not environment issue

are you able to run the test with existing model dumps?

and did you modify the script?

I use -gpu 0 in the script. Please try with the vanilla code and see if that works

get another issue gpu cuda error
are ckpt file saved by gpu?

i use -gpu 1 ,and did you save the model by gpu 0, i run test script by error as follows:

Traceback (most recent call last):
File "main_test.py", line 139, in
model = RetroGLN(cmd_args.dropbox, local_args.model_for_test)
File "/home/fengjiaxin/GLN/gln/test/model_inference.py", line 43, in init
self.gln.load_state_dict(torch.load(model_file))
File "/home/fengjiaxin/.conda/envs/my-rdkit-env/lib/python3.6/site-packages/torch/serialization.py", line 426, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/home/fengjiaxin/.conda/envs/my-rdkit-env/lib/python3.6/site-packages/torch/serialization.py", line 613, in _load
result = unpickler.load()
File "/home/fengjiaxin/.conda/envs/my-rdkit-env/lib/python3.6/site-packages/torch/serialization.py", line 576, in persistent_load
deserialized_objects[root_key] = restore_location(obj, location)
File "/home/fengjiaxin/.conda/envs/my-rdkit-env/lib/python3.6/site-packages/torch/serialization.py", line 155, in default_restore_location
result = fn(storage, location)
File "/home/fengjiaxin/.conda/envs/my-rdkit-env/lib/python3.6/site-packages/torch/serialization.py", line 135, in _cuda_deserialize
return storage_type(obj.size())
File "/home/fengjiaxin/.conda/envs/my-rdkit-env/lib/python3.6/site-packages/torch/cuda/init.py", line 634, in _lazy_new
return super(_CudaBase, cls).new(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory

yes it uses gpu by default. Please always use -gpu 0 in your script.
If you want to change GPU, please use CUDA_VISIBLE_DEVICES instead

hi , i debug the code ,some error at GLN/gln/graph_logic/soft_logic.py line 29
jagged_forward graph_embed = graph_enc(list)
no other information
can you introduce your code in brief
i can not find the error
thanks

can you give a docker image? i think it will be useful

graph_enc is from another sub package in this repo.

Can you first try without GPU? Please take a look at this:
https://discuss.pytorch.org/t/on-a-cpu-device-how-to-load-checkpoint-saved-on-gpu-device/349

to see how to load a gpu dump into cpu

hi, i debug the traing file and test file
got the same error ,not cuda error
would you introduce your code in brief ,thanks

If the error is happening in that line, you may double check the
https://github.com/Hanjun-Dai/GLN/blob/master/gln/mods/mol_gnn/gnn_family/utils.py#L64

note that different graph nn implementation will override this function.