pengxingang / Pocket2Mol

Pocket2Mol: Efficient Molecular Sampling Based on 3D Protein Pockets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

train::ERROR] Runtime Error Pin memory thread exited unexpectedly

caiyingchun opened this issue · comments

commented

I tried to train a new model by running train.py, but I got this:

[2023-06-28 10:32:08,821::train::INFO] Namespace(config='./configs/train.yml', device='cuda', logdir='./logs')
[2023-06-28 10:32:08,821::train::INFO] {'model': {'vn': 'vn', 'hidden_channels': 256, 'hidden_channels_vec': 64, 'encoder': {'name': 'cftfm', 'hidden_channels': 256, 'hidden_channels_vec': 64, 'edge_channels': 64, 'key_channels': 128, 'num_heads': 4, 'num_interactions': 6, 'cutoff': 10.0, 'knn': 48}, 'field': {'name': 'classifier', 'num_filters': 128, 'num_filters_vec': 32, 'edge_channels': 64, 'num_heads': 4, 'cutoff': 10.0, 'knn': 32}, 'position': {'num_filters': 128, 'n_component': 3}}, 'train': {'seed': 2023, 'use_apex': False, 'batch_size': 8, 'num_workers': 8, 'pin_memory': True, 'max_iters': 500000, 'val_freq': 5000, 'pos_noise_std': 0.1, 'max_grad_norm': 100.0, 'optimizer': {'type': 'adam', 'lr': 0.0002, 'weight_decay': 0, 'beta1': 0.99, 'beta2': 0.999}, 'scheduler': {'type': 'plateau', 'factor': 0.6, 'patience': 8, 'min_lr': 1e-05}, 'transform': {'mask': {'type': 'mixed', 'min_ratio': 0.0, 'max_ratio': 1.1, 'min_num_masked': 1, 'min_num_unmasked': 0, 'p_random': 0.15, 'p_bfs': 0.6, 'p_invbfs': 0.25}, 'contrastive': {'num_real': 20, 'num_fake': 20, 'pos_real_std': 0.05, 'pos_fake_std': 2.0}, 'edgesampler': {'k': 8}}}, 'dataset': {'name': 'pl', 'path': './data/crossdocked_pocket10', 'split': './data/split_by_name.pt'}}
[2023-06-28 10:32:08,823::train::INFO] Loading dataset...
[2023-06-28 10:32:09,280::train::INFO] Building model...
Num of parameters is 3711167
/data/sdb/opt/miniconda3/envs/aidd/lib/python3.7/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/data/sdb/opt/miniconda3/envs/aidd/lib/python3.7/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/data/sdb/opt/miniconda3/envs/aidd/lib/python3.7/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/data/sdb/opt/miniconda3/envs/aidd/lib/python3.7/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/data/sdb/opt/miniconda3/envs/aidd/lib/python3.7/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/data/sdb/opt/miniconda3/envs/aidd/lib/python3.7/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/data/sdb/opt/miniconda3/envs/aidd/lib/python3.7/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/data/sdb/opt/miniconda3/envs/aidd/lib/python3.7/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/data/sdb/opt/miniconda3/envs/aidd/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/data/sdb/opt/miniconda3/envs/aidd/lib/python3.7/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/data/sdb/opt/miniconda3/envs/aidd/lib/python3.7/site-packages/torch/utils/data/_utils/pin_memory.py", line 49, in _pin_memory_loop
    do_one_step()
  File "/data/sdb/opt/miniconda3/envs/aidd/lib/python3.7/site-packages/torch/utils/data/_utils/pin_memory.py", line 26, in do_one_step
    r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL)
  File "/data/sdb/opt/miniconda3/envs/aidd/lib/python3.7/multiprocessing/queues.py", line 113, in get
    return _ForkingPickler.loads(res)
  File "/data/sdb/opt/miniconda3/envs/aidd/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 305, in rebuild_storage_fd
    fd = df.detach()
  File "/data/sdb/opt/miniconda3/envs/aidd/lib/python3.7/multiprocessing/resource_sharer.py", line 58, in detach
    return reduction.recv_handle(conn)
  File "/data/sdb/opt/miniconda3/envs/aidd/lib/python3.7/multiprocessing/reduction.py", line 185, in recv_handle
    return recvfds(s, 1)[0]
  File "/data/sdb/opt/miniconda3/envs/aidd/lib/python3.7/multiprocessing/reduction.py", line 161, in recvfds
    len(ancdata))
RuntimeError: received 0 items of ancdata

[2023-06-28 10:32:12,068::train::INFO] [Train] Iter 1 | Loss 10.276641 | Loss(Fron) 0.631725 | Loss(Pos) 3.812413 | Loss(Cls) 1.901050 | Loss(Edge) 1.675482 | Loss(Real) 0.126777 | Loss(Fake) 2.129193 | Loss(Surf) 0.000000
[2023-06-28 10:32:12,073::train::ERROR] Runtime Error Pin memory thread exited unexpectedly
Traceback (most recent call last):
  File "train.py", line 227, in <module>
    train(it)
  File "train.py", line 108, in train
    batch = next(train_iterator).to(args.device)
StopIteration

Try adding torch.multiprocessing.set_sharing_strategy('file_system') at the top of the file.