microsoft / protein-frame-flow

Fast protein backbone generation with SE(3) flow matching.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Metadata has hardcoded paths which prevent training from being run

chaitjo opened this issue · comments

Thanks for the great repository!

I've been unable to run training after setting up the repository, as there seem to be hardcoded paths from which the datamodule loads preprocessed data that do not exist on my system.

Here's an example output:

$ python -W ignore experiments/train_se3_flows.py
[2024-02-08 17:50:16,382][__main__][INFO] - Checkpoints saved to ckpt/se3-fm/baseline/2024-02-08_17-49-56
[2024-02-08 17:50:16,436][__main__][INFO] - Using devices: [0]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/1
[2024-02-08 17:50:17,002][torch.distributed.distributed_c10d][INFO] - Added key: store_based_barrier_key:1 to store for rank: 0
[2024-02-08 17:50:17,003][torch.distributed.distributed_c10d][INFO] - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
----------------------------------------------------------------------------------------------------
distributed_backend=nccl
All distributed processes registered. Starting with 1 processes
----------------------------------------------------------------------------------------------------

[2024-02-08 17:50:18,461][data.pdb_dataloader][INFO] - Training: 3938 examples
[2024-02-08 17:50:18,531][data.pdb_dataloader][INFO] - Validation: 40 examples with lengths [ 20  38  53  68  83  98 113 128]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name  | Type      | Params
------------------------------------
0 | model | FlowModel | 16.7 M
------------------------------------
16.7 M    Trainable params
0         Non-trainable params
16.7 M    Total params
66.984    Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]Failed to read /data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2voua2.pkl. First error: [Errno 2] No such file or directory: '/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2voua2.pkl'
 Second error: [Errno 2] No such file or directory: '/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2voua2.pkl'
Failed to read /data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2ymza1.pkl. First error: [Errno 2] No such file or directory: '/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2ymza1.pkl'
 Second error: [Errno 2] No such file or directory: '/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2ymza1.pkl'
Failed to read /data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2xw6a1.pkl. First error: [Errno 2] No such file or directory: '/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2xw6a1.pkl'
 Second error: [Errno 2] No such file or directory: '/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2xw6a1.pkl'
Failed to read /data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2hewf1.pkl. First error: [Errno 2] No such file or directory: '/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2hewf1.pkl'
 Second error: [Errno 2] No such file or directory: '/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2hewf1.pkl'
Failed to read /data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2xw6a1.pkl. First error: [Errno 2] No such file or directory: '/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2xw6a1.pkl'
 Second error: [Errno 2] No such file or directory: '/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2xw6a1.pkl'
Failed to read /data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2voua2.pkl. First error: [Errno 2] No such file or directory: '/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2voua2.pkl'
 Second error: [Errno 2] No such file or directory: '/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2voua2.pkl'
Failed to read /data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2hewf1.pkl. First error: [Errno 2] No such file or directory: '/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2hewf1.pkl'
 Second error: [Errno 2] No such file or directory: '/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2hewf1.pkl'
Failed to read /data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2ymza1.pkl. First error: [Errno 2] No such file or directory: '/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2ymza1.pkl'
 Second error: [Errno 2] No such file or directory: '/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2ymza1.pkl'
Failed to read /data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d5uj5a1.pkl. First error: [Errno 2] No such file or directory: '/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d5uj5a1.pkl'
 Second error: [Errno 2] No such file or directory: '/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d5uj5a1.pkl'
Error executing job with overrides: []
Traceback (most recent call last):
  File "/home/ckj24/protein-frame-flow/experiments/train_se3_flows.py", line 97, in main
    exp.train()
  File "/home/ckj24/protein-frame-flow/experiments/train_se3_flows.py", line 72, in train
    trainer.fit(
  File "/home/ckj24/miniforge-pypy3/envs/fm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 532, in fit
    call._call_and_handle_interrupt(
  File "/home/ckj24/miniforge-pypy3/envs/fm/lib/python3.10/site-packages/pytorch_lightning/trainer/call.py", line 42, in _call_and_handle_interrupt
    return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
  File "/home/ckj24/miniforge-pypy3/envs/fm/lib/python3.10/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 93, in launch
    return function(*args, **kwargs)
  File "/home/ckj24/miniforge-pypy3/envs/fm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 571, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/ckj24/miniforge-pypy3/envs/fm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 980, in _run
    results = self._run_stage()
  File "/home/ckj24/miniforge-pypy3/envs/fm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1021, in _run_stage
    self._run_sanity_check()
  File "/home/ckj24/miniforge-pypy3/envs/fm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1050, in _run_sanity_check
    val_loop.run()
  File "/home/ckj24/miniforge-pypy3/envs/fm/lib/python3.10/site-packages/pytorch_lightning/loops/utilities.py", line 181, in _decorator
    return loop_run(self, *args, **kwargs)
  File "/home/ckj24/miniforge-pypy3/envs/fm/lib/python3.10/site-packages/pytorch_lightning/loops/evaluation_loop.py", line 108, in run
    batch, batch_idx, dataloader_idx = next(data_fetcher)
  File "/home/ckj24/miniforge-pypy3/envs/fm/lib/python3.10/site-packages/pytorch_lightning/loops/fetchers.py", line 137, in __next__
    self._fetch_next_batch(self.dataloader_iter)
  File "/home/ckj24/miniforge-pypy3/envs/fm/lib/python3.10/site-packages/pytorch_lightning/loops/fetchers.py", line 151, in _fetch_next_batch
    batch = next(iterator)
  File "/home/ckj24/miniforge-pypy3/envs/fm/lib/python3.10/site-packages/pytorch_lightning/utilities/combined_loader.py", line 285, in __next__
    out = next(self._iterator)
  File "/home/ckj24/miniforge-pypy3/envs/fm/lib/python3.10/site-packages/pytorch_lightning/utilities/combined_loader.py", line 123, in __next__
    out = next(self.iterators[0])
  File "/home/ckj24/miniforge-pypy3/envs/fm/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 633, in __next__
    data = self._next_data()
  File "/home/ckj24/miniforge-pypy3/envs/fm/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
    return self._process_data(data)
  File "/home/ckj24/miniforge-pypy3/envs/fm/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
    data.reraise()
  File "/home/ckj24/miniforge-pypy3/envs/fm/lib/python3.10/site-packages/torch/_utils.py", line 644, in reraise
    raise exception
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/ckj24/protein-frame-flow/data/utils.py", line 195, in read_pkl
    with open(read_path, 'rb') as handle:
FileNotFoundError: [Errno 2] No such file or directory: '/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2voua2.pkl'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ckj24/miniforge-pypy3/envs/fm/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/ckj24/miniforge-pypy3/envs/fm/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ckj24/miniforge-pypy3/envs/fm/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ckj24/protein-frame-flow/data/pdb_dataloader.py", line 157, in __getitem__
    chain_feats = self._process_csv_row(processed_file_path)
  File "/home/ckj24/protein-frame-flow/data/pdb_dataloader.py", line 119, in _process_csv_row
    processed_feats = du.read_pkl(processed_file_path)
  File "/home/ckj24/protein-frame-flow/data/utils.py", line 200, in read_pkl
    raise(e)
  File "/home/ckj24/protein-frame-flow/data/utils.py", line 191, in read_pkl
    with open(read_path, 'rb') as handle:
FileNotFoundError: [Errno 2] No such file or directory: '/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d2voua2.pkl'

Here's what preprocessed/metadata.csv looks like:

pdb_name,processed_path,raw_path,num_chains,quaternary_category,seq_len,modeled_seq_len,coil_percent,helix_percent,strand_percent,radius_gyration
d1hp1a2,/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d1hp1a2.pkl,/data/rsg/chemistry/jyim/large_data/scope/d1hp1a2.pdb,1,homomer,328,328,0.45121951219512196,0.2682926829268293,0.2804878048780488,1.9195774410000415
d1w25a2,/data/rsg/chemistry/jyim/projects/flow-matching/preprocessed/d1w25a2.pkl,/data/rsg/chemistry/jyim/large_data/scope/d1w25a2.pdb,1,homomer,153,153,0.43137254901960786,0.4117647058823529,0.1568627450980392,1.64664663551551
...

I simply changed all the hardcoded paths to relative paths to the preprocessed data, which fixed the issue and enabled training to be run. However, the maintainers may want to fix this pesky issue in subsequent releases.

Yeah this is a bug. I plan to release code for motif-scaffolding during which I will update the datasets and metadata. Thanks for pointing this out for me to remember.