ValueError: Caught ValueError in DataLoader worker process 0.

Question

ValueError: Caught ValueError in DataLoader worker process 0.

Qcatbot opened this issue 2 years ago · comments

I tried to run the atria_segmentation_2018 in Colab. I get the following error. Much appreciate if any input to resolve this error;

orientation: 0
dim, 81

       Starting training:
           Model_name:atria_mt_model
           Epochs: 2000
           Batch size: 5
           Learning rate: 0.001

lr_policy = [step]
No checkpoint found at 'models/atria_mt_model.pkl'
/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
"https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:154: UserWarning: The epoch parameter in scheduler.step() was not necessary and is being deprecated where possible. Please use scheduler.step() to step the scheduler. During the deprecation, if epoch is different from None, the closed form is used instead of the new chainable form, where available. Please open an issue if you are unable to replicate your use case: https://github.com/pytorch/pytorch/issues/new/choose.
warnings.warn(EPOCH_DEPRECATION_WARNING, UserWarning)
Number of train images: 75 nrrds
loading csv as extra label
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
Number of validate images: 25 nrrds
Preloading the validate dataset ...
Loading is done

loading csv as extra label
/usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:370: UserWarning: To get the last learning rate computed by the scheduler, please use get_last_lr().
"please use get_last_lr().", UserWarning)

                         Starting training:
                             model_name:atria_mt_model
                             lr: [0.001]
                             Image size: 256
                             Training size: 75
                             Validation size: 25
                             Checkpoints: True
                             CUDA: True

Starting epoch 1/2000.
0% 0/15 [00:00<?, ?it/s]Traceback (most recent call last):
File "train_atria_seg.py", line 261, in
n_classes=options.n_classes, gpu=options.gpu,if_clahe=options.enhance, if_gamma_correction=options.gamma,if_mip=options.mip)
File "train_atria_seg.py", line 104, in train_net
for epoch_iter,data in tqdm(enumerate(train_loader, 1), total=len(train_loader)):
File "/usr/local/lib/python3.7/dist-packages/tqdm/_tqdm.py", line 941, in iter
for obj in iterable:
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 434, in reraise
raise exception
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/content/MyDrive/Internship_VAR/2018_Atrial_Segmentation_Challenge_COMPLETE_DATASET/VRAI Heart/atria_segmentation_2018/data_io/atria_dataset.py", line 144, in getitem
new_input,new_target = self.pair_transform(input,target,self.input_h,self.input_w)
File "/content/MyDrive/Internship_VAR/2018_Atrial_Segmentation_Challenge_COMPLETE_DATASET/VRAI Heart/atria_segmentation_2018/data_io/atria_dataset.py", line 186, in pair_transform
image, label = data_aug(image, label)
File "/content/MyDrive/Internship_VAR/2018_Atrial_Segmentation_Challenge_COMPLETE_DATASET/VRAI Heart/atria_segmentation_2018/data_io/data_augmentation.py", line 26, in call
img, mask = a(img, mask)
File "/content/MyDrive/Internship_VAR/2018_Atrial_Segmentation_Challenge_COMPLETE_DATASET/VRAI Heart/atria_segmentation_2018/data_io/data_augmentation.py", line 111, in call
t1_slice_tform, mask_slice_tform = tform(img, mask)
ValueError: not enough values to unpack (expected 2, got 1)

Chen (Cherise) Chen · Answer 1 · Mon Aug 08 2022 02:14:00 GMT+0800 (China Standard Time)

Hi this error is introduced with the upgraded PyTorch where the scheduler must be called after optimizer.step(). We have fixed this bug. Please let me know if it doesn't work.