bowang-lab / U-Mamba

U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

Home Page:https://arxiv.org/abs/2401.04722

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Prediction-RuntimeError: Some background workers are no longer alive

Saul62 opened this issue · comments

Hello, I encountered an issue when validating the 701 dataset, the program throws an error after predicting 9 data points.
The problem arises when using the Python multiprocessing module for parallel computation. Based on the provided error messages, there are two main issues: RuntimeError: Some background workers are no longer alive and multiprocessing.managers.RemoteError and KeyError. How can this be resolved?

Error message as follows:

Predicting FLARETs_0010:
perform_everything_on_device: True
Traceback (most recent call last):
File "/root/miniconda3/envs/umamba/bin/nnUNetv2_predict", line 33, in
sys.exit(load_entry_point('nnunetv2', 'console_scripts', 'nnUNetv2_predict')())
File "/root/onethingai-tmp/U-Mamba/umamba/nnunetv2/inference/predict_from_raw_data.py", line 831, in predict_entry_point
predictor.predict_from_files(args.i, args.o, save_probabilities=args.save_probabilities,
File "/root/onethingai-tmp/U-Mamba/umamba/nnunetv2/inference/predict_from_raw_data.py", line 250, in predict_from_files
return self.predict_from_data_iterator(data_iterator, save_probabilities, num_processes_segmentation_export)
File "/root/onethingai-tmp/U-Mamba/umamba/nnunetv2/inference/predict_from_raw_data.py", line 366, in predict_from_data_iterator
proceed = not check_workers_alive_and_busy(export_pool, worker_list, r, allowed_num_queued=2)
File "/root/onethingai-tmp/U-Mamba/umamba/nnunetv2/utilities/file_path_utilities.py", line 103, in check_workers_alive_and_busy
raise RuntimeError('Some background workers are no longer alive')
RuntimeError: Some background workers are no longer alive
Process SpawnProcess-8:
Traceback (most recent call last):
File "/root/miniconda3/envs/umamba/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/root/miniconda3/envs/umamba/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/root/onethingai-tmp/U-Mamba/umamba/nnunetv2/inference/data_iterators.py", line 57, in preprocess_fromfiles_save_to_queue
raise e
File "/root/onethingai-tmp/U-Mamba/umamba/nnunetv2/inference/data_iterators.py", line 50, in preprocess_fromfiles_save_to_queue
target_queue.put(item, timeout=0.01)
File "", line 2, in put
File "/root/miniconda3/envs/umamba/lib/python3.10/multiprocessing/managers.py", line 833, in _callmethod
raise convert_to_error(kind, result)
multiprocessing.managers.RemoteError:

Traceback (most recent call last):
File "/root/miniconda3/envs/umamba/lib/python3.10/multiprocessing/managers.py", line 260, in serve_client
self.id_to_local_proxy_obj[ident]
KeyError: '7ff7fdd1f460'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/miniconda3/envs/umamba/lib/python3.10/multiprocessing/managers.py", line 262, in serve_client
raise ke
File "/root/miniconda3/envs/umamba/lib/python3.10/multiprocessing/managers.py", line 256, in serve_client
obj, exposed, gettypeid = id_to_obj[ident]
KeyError: '7ff7fdd1f460'

I have the same issue when validating the 701 dataset.

Predicting FLARETs_0004:
perform_everything_on_device: True
Traceback (most recent call last):
  File "/root/miniconda3/envs/umamba/bin/nnUNetv2_predict", line 33, in <module>
    sys.exit(load_entry_point('nnunetv2', 'console_scripts', 'nnUNetv2_predict')())
  File "/data_local/commit/U-Mamba/umamba/nnunetv2/inference/predict_from_raw_data.py", line 834, in predict_entry_point
    predictor.predict_from_files(args.i, args.o, save_probabilities=args.save_probabilities,
  File "/data_local/commit/U-Mamba/umamba/nnunetv2/inference/predict_from_raw_data.py", line 251, in predict_from_files
    return self.predict_from_data_iterator(data_iterator, save_probabilities, num_processes_segmentation_export)
  File "/data_local/commit/U-Mamba/umamba/nnunetv2/inference/predict_from_raw_data.py", line 367, in predict_from_data_iterator
    proceed = not check_workers_alive_and_busy(export_pool, worker_list, r, allowed_num_queued=2)
  File "/data_local/commit/U-Mamba/umamba/nnunetv2/utilities/file_path_utilities.py", line 103, in check_workers_alive_and_busy
    raise RuntimeError('Some background workers are no longer alive')
RuntimeError: Some background workers are no longer alive
output/nnunet_predict_701/FLARETs_0006
torch.Size([1, 210, 380, 380])
Process SpawnProcess-4:
Traceback (most recent call last):
  File "/root/miniconda3/envs/umamba/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/umamba/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/data_local/commit/U-Mamba/umamba/nnunetv2/inference/data_iterators.py", line 61, in preprocess_fromfiles_save_to_queue
    raise e
  File "/data_local/commit/U-Mamba/umamba/nnunetv2/inference/data_iterators.py", line 50, in preprocess_fromfiles_save_to_queue
    target_queue.put(item, timeout=0.01)
  File "<string>", line 2, in put
  File "/root/miniconda3/envs/umamba/lib/python3.10/multiprocessing/managers.py", line 833, in _callmethod
    raise convert_to_error(kind, result)
multiprocessing.managers.RemoteError: 
---------------------------------------------------------------------------
Traceback (most recent call last):
  File "/root/miniconda3/envs/umamba/lib/python3.10/multiprocessing/managers.py", line 260, in serve_client
    self.id_to_local_proxy_obj[ident]
KeyError: '7feade491030'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/root/miniconda3/envs/umamba/lib/python3.10/multiprocessing/managers.py", line 262, in serve_client
    raise ke
  File "/root/miniconda3/envs/umamba/lib/python3.10/multiprocessing/managers.py", line 256, in serve_client
    obj, exposed, gettypeid = id_to_obj[ident]
KeyError: '7feade491030'
---------------------------------------------------------------------------
commented

I have the similar question:

-##---------------
Traceback (most recent call last):
File "/mnt/zqk/.conda/envs/umamba/bin/nnUNetv2_train", line 33, in
sys.exit(load_entry_point('nnunetv2', 'console_scripts', 'nnUNetv2_train')())
File "/mnt/zqk/BTCV/model_test/SwinUNetR/U-Mamba-main/umamba/nnunetv2/run/run_training.py", line 268, in run_training_entry
run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
File "/mnt/zqk/BTCV/model_test/SwinUNetR/U-Mamba-main/umamba/nnunetv2/run/run_training.py", line 204, in run_training
nnunet_trainer.run_training()
File "/mnt/zqk/BTCV/model_test/SwinUNetR/U-Mamba-main/umamba/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1258, in run_training
train_outputs.append(self.train_step(next(self.dataloader_train)))
File "/mnt/zqk/BTCV/model_test/SwinUNetR/U-Mamba-main/umamba/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 900, in train_step
output = self.network(data)
File "/mnt/zqk/.conda/envs/umamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/zqk/BTCV/model_test/SwinUNetR/U-Mamba-main/umamba/nnunetv2/nets/UMambaEnc.py", line 352, in forward
skips = self.encoder(x)
File "/mnt/zqk/.conda/envs/umamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/zqk/BTCV/model_test/SwinUNetR/U-Mamba-main/umamba/nnunetv2/nets/UMambaEnc.py", line 163, in forward
x = self.mamba_layerss
File "/mnt/zqk/.conda/envs/umamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/zqk/.conda/envs/umamba/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/mnt/zqk/BTCV/model_test/SwinUNetR/U-Mamba-main/umamba/nnunetv2/nets/UMambaEnc.py", line 46, in forward
x_mamba = self.mamba(x_norm)
File "/mnt/zqk/.conda/envs/umamba/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/mnt/zqk/.conda/envs/umamba/lib/python3.10/site-packages/mamba_ssm/modules/mamba_simple.py", line 146, in forward
out = mamba_inner_fn(
File "/mnt/zqk/.conda/envs/umamba/lib/python3.10/site-packages/mamba_ssm/ops/selective_scan_interface.py", line 317, in mamba_inner_fn
return MambaInnerFn.apply(xz, conv1d_weight, conv1d_bias, x_proj_weight, delta_proj_weight,
File "/mnt/zqk/.conda/envs/umamba/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/mnt/zqk/.conda/envs/umamba/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 98, in decorate_fwd
return fwd(*args, **kwargs)
File "/mnt/zqk/.conda/envs/umamba/lib/python3.10/site-packages/mamba_ssm/ops/selective_scan_interface.py", line 187, in forward
conv1d_out = causal_conv1d_cuda.causal_conv1d_fwd(
TypeError: causal_conv1d_fwd(): incompatible function arguments. The following argument types are supported:
1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: Optional[torch.Tensor], arg3: Optional[torch.Tensor], arg4: bool) -> torch.Tensor

Invoked with: tensor([[[-0.0491, 0.1108, 0.1163, ..., -0.1036, -0.1036, 0.1690],
[-0.0855, -0.0731, -0.0899, ..., 0.0130, 0.0130, -0.4536],
[-0.5495, -0.3877, -0.4273, ..., 0.5782, 0.5782, 0.5817],
...,
[-0.0828, -0.2404, -0.1682, ..., -0.5110, -0.5110, -0.3344],
[ 0.4582, 0.3348, 0.4075, ..., -0.4091, -0.4091, -0.5827],
[-0.3157, -0.4055, -0.3878, ..., -0.5348, -0.5348, -0.1631]],

    [[-0.2968,  0.1006,  0.0193,  ..., -0.1083, -0.1083,  0.1561],
     [ 0.2038, -0.3414, -0.2363,  ...,  0.0206,  0.0206, -0.4348],
     [-0.4408, -0.2806, -0.3668,  ...,  0.5653,  0.5653,  0.5552],
     ...,
     [-0.2908, -0.2763, -0.2454,  ..., -0.5056, -0.5056, -0.3196],
     [ 0.0712,  0.1875,  0.1929,  ..., -0.4032, -0.4032, -0.5714],
     [ 0.2172, -0.4126, -0.3173,  ..., -0.5288, -0.5288, -0.1556]]],
   device='cuda:0', requires_grad=True), tensor([[ 0.2341, -0.4789, -0.0208,  0.3874],
    [ 0.3118, -0.2485, -0.2377,  0.4749],
    [ 0.1615,  0.4887,  0.4222,  0.0927],
    [-0.4377,  0.0944, -0.1609,  0.1785],
    [-0.1287, -0.3506, -0.4573,  0.1306],
    [-0.4764,  0.4150, -0.0747,  0.2143],
    [-0.3022,  0.1012, -0.2865,  0.1749],
    [ 0.3570, -0.1671,  0.0872,  0.0585],
    [ 0.0920, -0.4386, -0.1715, -0.3099],
    [-0.4049,  0.3184, -0.2907,  0.1653],
    [-0.1470, -0.4065, -0.2341, -0.2470],
    [ 0.1234, -0.4312, -0.4374, -0.0321],
    [-0.1388,  0.2120, -0.3501,  0.3295],
    [ 0.2373, -0.2574,  0.3881, -0.0462],
    [-0.2355, -0.2754, -0.0305, -0.1253],
    [-0.2050,  0.4809, -0.2690, -0.4272],
    [-0.0697, -0.4713,  0.2718, -0.3010],
    [-0.4286,  0.4376, -0.0172, -0.0580],
    [-0.4233, -0.0699,  0.4696,  0.4137],
    [-0.2423,  0.1985, -0.4915,  0.4235],
    [ 0.3321,  0.2985,  0.1137,  0.0867],
    [-0.3255,  0.4006, -0.1824,  0.1523],
    [ 0.1196,  0.2120, -0.0944,  0.0589],
    [ 0.1203, -0.3548, -0.3386,  0.2519],
    [-0.4032, -0.1827, -0.4929, -0.3168],
    [ 0.1784,  0.0232, -0.0495,  0.2028],
    [-0.3688, -0.4267,  0.3309,  0.4874],
    [ 0.2087,  0.1669, -0.0523, -0.0068],
    [-0.4391, -0.3192, -0.1679,  0.1272],
    [ 0.3335, -0.3604, -0.0523,  0.3940],
    [ 0.4864,  0.2191, -0.2756, -0.2299],
    [-0.0314, -0.1608,  0.2049, -0.3435],
    [ 0.4361, -0.0147, -0.4413, -0.3398],
    [-0.3455, -0.1826,  0.4357,  0.1847],
    [-0.2165,  0.0143,  0.0508, -0.4730],
    [-0.1745, -0.0662, -0.4641, -0.1006],
    [ 0.2257, -0.1837,  0.0892,  0.1096],
    [ 0.0550, -0.0665, -0.4336,  0.3441],
    [-0.2285,  0.4836,  0.2190, -0.2985],
    [-0.3567, -0.1537, -0.3341, -0.0128],
    [-0.2371, -0.0663, -0.4536,  0.0082],
    [ 0.0554,  0.0025, -0.1911,  0.1813],
    [-0.4043, -0.0907, -0.2568, -0.4053],
    [ 0.3486, -0.0422,  0.0131, -0.1056],
    [ 0.2010, -0.0947, -0.2872, -0.2287],
    [-0.4851, -0.1853, -0.4469,  0.4861],
    [ 0.4966, -0.3591,  0.1496,  0.0835],
    [ 0.4758,  0.2139, -0.0215, -0.2494],
    [ 0.2504, -0.0795, -0.4824,  0.1999],
    [-0.4946,  0.2453, -0.4168,  0.4381],
    [-0.3890, -0.3599, -0.3134, -0.0867],
    [ 0.4481,  0.2203,  0.2909, -0.3969],
    [ 0.3748, -0.0066, -0.4547, -0.1453],
    [-0.0921,  0.4662,  0.1144,  0.0293],
    [ 0.0494,  0.4914, -0.1008, -0.2834],
    [ 0.0627,  0.3752,  0.2841, -0.2204],
    [-0.4730, -0.0404, -0.0578,  0.4405],
    [ 0.3908,  0.4092, -0.3176,  0.4471],
    [ 0.4153, -0.1096,  0.4343,  0.3190],
    [-0.3495, -0.0411,  0.4542, -0.4151],
    [-0.0707, -0.4734, -0.2026, -0.4487],
    [ 0.0126,  0.0120,  0.1909,  0.2329],
    [ 0.2429, -0.0376, -0.2207,  0.1399],
    [ 0.4854, -0.3094,  0.3535,  0.0131]], device='cuda:0',
   requires_grad=True), Parameter containing:

tensor([ 0.0747, -0.4136, -0.1951, 0.0490, 0.1685, 0.4614, -0.4164, -0.2429,
-0.1325, 0.1565, -0.2873, -0.4702, -0.4290, 0.3216, 0.2686, 0.0714,
-0.1852, -0.4706, -0.1142, 0.4662, -0.0884, 0.2010, -0.3492, -0.2183,
0.3453, -0.1287, -0.3230, -0.1082, 0.4222, -0.2139, -0.3322, -0.4582,
0.1118, 0.3871, 0.1275, -0.1565, 0.3038, 0.2347, 0.0825, -0.2411,
-0.1832, -0.1973, -0.3639, -0.0440, 0.3261, -0.4377, -0.1061, -0.1320,
-0.3778, -0.2043, -0.3343, -0.4298, -0.3915, -0.0954, -0.0720, -0.3083,
-0.3703, 0.0058, -0.3564, 0.2000, 0.3286, 0.0201, -0.3494, -0.2806],
device='cuda:0', requires_grad=True), None, None, None, True
Exception in thread Thread-4 (results_loop):
Traceback (most recent call last):
File "/mnt/zqk/.conda/envs/umamba/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/mnt/zqk/.conda/envs/umamba/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/mnt/zqk/.conda/envs/umamba/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop
raise e
File "/mnt/zqk/.conda/envs/umamba/lib/python3.10/site-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop
raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message

I have the same problem. I wonder if it's a lack of computing power on my machine, or if it's a causal_conv_1d error.

I have the same issue:
2024-03-14 22:47:59.337693: unpacking dataset...
2024-03-14 22:47:59.667426: unpacking done...
2024-03-14 22:47:59.667924: do_dummy_2d_data_aug: False
2024-03-14 22:47:59.680668: Unable to plot network architecture:
2024-03-14 22:47:59.680761: No module named 'hiddenlayer'
2024-03-14 22:47:59.687895:
2024-03-14 22:47:59.687997: Epoch 0
2024-03-14 22:47:59.688121: Current learning rate: 0.01
using pin_memory on device 0
Traceback (most recent call last):
File "/usr/local/bin/nnUNetv2_train", line 33, in
sys.exit(load_entry_point('nnunetv2', 'console_scripts', 'nnUNetv2_train')())
File "/content/U-Mamba/umamba/nnunetv2/run/run_training.py", line 268, in run_training_entry
run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
File "/content/U-Mamba/umamba/nnunetv2/run/run_training.py", line 204, in run_training
nnunet_trainer.run_training()
File "/content/U-Mamba/umamba/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1258, in run_training
train_outputs.append(self.train_step(next(self.dataloader_train)))
File "/content/U-Mamba/umamba/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 900, in train_step
output = self.network(data)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/content/U-Mamba/umamba/nnunetv2/nets/UMambaBot.py", line 207, in forward
out = self.mamba(middle_feature_flat)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/mamba_ssm/modules/mamba_simple.py", line 146, in forward
out = mamba_inner_fn(
File "/usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/selective_scan_interface.py", line 317, in mamba_inner_fn
return MambaInnerFn.apply(xz, conv1d_weight, conv1d_bias, x_proj_weight, delta_proj_weight,
File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/autocast_mode.py", line 98, in decorate_fwd
return fwd(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/mamba_ssm/ops/selective_scan_interface.py", line 187, in forward
conv1d_out = causal_conv1d_cuda.causal_conv1d_fwd(
TypeError: causal_conv1d_fwd(): incompatible function arguments. The following argument types are supported:
1. (arg0: torch.Tensor, arg1: torch.Tensor, arg2: Optional[torch.Tensor], arg3: Optional[torch.Tensor], arg4: bool) -> torch.Tensor

Invoked with: tensor([[[-0.6382, -0.4265, -0.4575, ..., 0.3599, 0.4956, -0.2935],
[ 0.3787, 0.4329, -0.5562, ..., 0.3960, 0.4448, 0.5083],
[ 0.3894, -0.3760, -0.7661, ..., -0.1026, 0.1354, 0.1112],
...,
[-0.9312, 0.0642, 1.0596, ..., 0.0847, -0.7080, -1.0225],
[ 0.3203, 0.3088, 0.7344, ..., -0.4385, -0.1875, 0.0043],
[-0.0684, 0.5269, 0.0842, ..., -0.9180, 0.6592, -0.1587]],

    [[-0.0917,  0.5205,  0.0656,  ..., -0.0911,  0.4780,  0.3191],
     [ 0.3711,  0.3787,  0.2769,  ...,  0.3538,  0.7544,  0.1790],
     [ 0.1733,  0.3979, -0.9443,  ...,  0.0364, -0.0048,  0.1611],
     ...,
     [-0.3191, -0.3755,  0.1874,  ..., -1.1611, -2.0781, -0.3896],
     [-0.2236, -0.1467, -0.2396,  ..., -0.5059,  0.8311, -0.2249],
     [-0.3533,  0.1874,  0.8169,  ...,  0.0668, -1.0127, -0.8462]],

    [[ 0.3044, -0.1705,  0.0508,  ...,  0.9473,  0.3650,  0.4119],
     [ 0.3845, -0.3269,  0.3433,  ...,  0.8623,  0.1277, -0.4331],
     [-0.3108,  0.1451,  0.5273,  ..., -0.6851,  0.0798,  1.1738],
     ...,
     [-0.5898, -0.1954, -0.6011,  ..., -1.2520, -0.2563, -0.2235],
     [ 0.3484,  0.1733,  0.3040,  ...,  0.2323,  0.4419,  0.6016],
     [-0.1136,  0.1556, -0.0454,  ..., -0.1123, -0.6719, -0.4343]],

    ...,

    [[ 0.0906, -0.1805,  0.0513,  ..., -0.2311,  0.9443, -0.0299],
     [ 0.1479,  0.4751,  0.2771,  ...,  0.5444,  0.0271, -0.0429],
     [-0.0348, -0.5376, -0.2188,  ..., -0.6421, -0.1191,  0.3616],
     ...,
     [-0.2849,  0.5508,  0.3767,  ..., -0.7939, -1.1377,  0.0343],
     [ 0.4050,  0.1764, -0.2341,  ..., -0.5293,  0.0665, -0.3154],
     [-0.6084,  0.3564,  0.4814,  ...,  0.2068, -0.1576, -1.4629]],

    [[ 0.2343, -1.3701,  0.0996,  ...,  0.6562,  0.8042,  0.5381],
     [ 0.8242, -0.3867, -0.2098,  ..., -0.5718, -0.2374, -0.4104],
     [ 0.2595,  0.0446,  0.0566,  ..., -0.7080,  0.3467,  0.4282],
     ...,
     [ 0.0127, -0.6870, -0.4365,  ..., -1.0381, -0.1013, -0.3372],
     [ 0.3218, -0.0968, -0.0614,  ..., -0.7754,  0.0306,  0.5405],
     [ 0.0759, -0.0693,  0.2468,  ..., -0.2004, -0.8022, -0.5728]],

    [[ 0.0079, -0.5547, -0.2485,  ...,  0.1643, -0.2012, -0.0824],
     [-0.2629,  0.1907, -0.0386,  ...,  0.1499, -0.0655, -0.3374],
     [ 0.5361,  0.8872, -0.5195,  ..., -0.7358, -0.0739,  0.2191],
     ...,
     [ 0.3469,  0.1092,  0.3921,  ..., -0.2266, -0.2871, -0.5259],
     [ 0.1659, -0.4648, -0.6831,  ..., -0.4575,  0.1437,  0.4517],
     [-0.5703,  1.1455,  0.3037,  ..., -0.2773,  0.3811, -0.2539]]],
   device='cuda:0', dtype=torch.float16, requires_grad=True), tensor([[-0.1942,  0.0138,  0.4839, -0.3377],
    [-0.4573,  0.2494,  0.2311, -0.2305],
    [-0.4981,  0.1973, -0.1062,  0.4207],
    ...,
    [ 0.0431,  0.0944, -0.0752,  0.0998],
    [ 0.2777, -0.1555,  0.3114, -0.4224],
    [ 0.4277,  0.3805,  0.3027,  0.4618]], device='cuda:0',
   requires_grad=True), Parameter containing:

tensor([ 0.1733, -0.3476, 0.3965, ..., -0.3649, -0.4902, -0.2539],
device='cuda:0', requires_grad=True), None, None, None, True
Exception in thread Thread-4 (results_loop):
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 125, in results_loop
raise e
File "/usr/local/lib/python3.10/dist-packages/batchgenerators/dataloading/nondet_multi_threaded_augmenter.py", line 103, in results_loop
raise RuntimeError("One or more background workers are no longer alive. Exiting. Please check the "
RuntimeError: One or more background workers are no longer alive. Exiting. Please check the print statements above for the actual error message

I have the same problem, is there a solution?

commented

Hi All,

Please try the new code. We re-implemented the sampling function to improve the efficiency.

We also released the corresponding segmentation results.
https://drive.google.com/file/d/1qlzTym3YdyCt3eR8J90h636it4cCDCt8/view?usp=sharing