synthseg error when using GPU
jhauneuro opened this issue · comments
Summary
When running eddy with CUDA I get an error in synthseg. The interactive html report was also not created.
Additional details
- QSIPrep version: 0.21.4
- Docker version: 25.0.4
What were you trying to do?
Preprocess data with qsiprep using eddy with CUDA.
What did you expect to happen?
Successfully preprocess the data and all outputs.
What actually happened?
The dwi preprocessing was successful but I got an error in synthseg for the anatomical data. The subject's html report was also missing.
Reproducing the bug
Command
docker run -ti --rm --gpus all \
-v /usr/local/freesurfer/7.4.1/license.txt:/opt/freesurfer/license.txt:ro \
-v $bids_dir:/data:ro \
-v $eddy_config_file:/sngl/eddy/eddy_config.json:ro \
-v $out_dir:/out \
-v $out_dir:/scratch \
qsiprep:latest /data /out participant \
--eddy-config /sngl/eddy/eddy_config.json \
--output-resolution $res --pepolar-method TOPUP \
--bids-database-dir $bids_dir --omp-nthreads 1 \
-w /scratch
Crash log
Node: qsiprep_wf.single_subject_0158_wf.anat_preproc_wf.synthseg_anat_wf.synthseg
Working directory: /scratch/qsiprep_wf/single_subject_0158_wf/anat_preproc_wf/synthseg_anat_wf/synthseg
Node inputs:
args = <undefined>
environ = {'OMP_NUM_THREADS': '1'}
fast = False
input_image = <undefined>
num_threads = 1
out_post = <undefined>
out_qc = <undefined>
out_seg = <undefined>
robust = <undefined>
subjects_dir = <undefined>
Traceback (most recent call last):
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/pipeline/plugins/multiproc.py", line 67, in run_node
result["result"] = node.run(updatehash=updatehash)
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
result = self._run_interface(execute=True)
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 645, in _run_interface
return self._run_command(execute)
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 771, in _run_command
raise NodeExecutionError(msg)
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node synthseg.
Cmdline:
mri_synthseg --i /scratch/qsiprep_wf/single_subject_0158_wf/anat_preproc_wf/pad_anat_reference_wf/resample_skulled_to_reference/sub-0158_T1w_lps_trans.nii.gz --threads 1 --post sub-0158_T1w_lps_trans_post.nii.gz --qc sub-0158_T1w_lps_trans_qc.csv --o sub-0158_T1w_lps_trans_aseg.nii.gz
Stdout:
SynthSeg 2.0
using 1 thread
predicting 1/1
the following problem occured with image /scratch/qsiprep_wf/single_subject_0158_wf/anat_preproc_wf/pad_anat_reference_wf/resample_skulled_to_reference/sub-0158_T1w_lps_trans.nii.gz :
Traceback (most recent call last):
File "/opt/freesurfer/bin/mri_synthseg", line 273, in predict
post_patch_segmentation, qc_score = net.predict([image, shape_input])
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.ResourceExhaustedError: Graph execution error:
Detected at node 'model_3/unet_up_8/concat_2' defined at (most recent call last):
File "/opt/freesurfer/bin/mri_synthseg", line 2581, in <module>
main()
File "/opt/freesurfer/bin/mri_synthseg", line 123, in main
predict(
File "/opt/freesurfer/bin/mri_synthseg", line 273, in predict
post_patch_segmentation, qc_score = net.predict([image, shape_input])
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/engine/training.py", line 1982, in predict
tmp_batch_outputs = self.predict_function(iterator)
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/engine/training.py", line 1801, in predict_function
return step_function(self, iterator)
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/engine/training.py", line 1790, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/engine/training.py", line 1783, in run_step
outputs = model.predict_step(data)
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/engine/training.py", line 1751, in predict_step
return self(x, training=False)
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1096, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
return fn(*args, **kwargs)
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/engine/functional.py", line 451, in call
return self._run_internal_graph(
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/engine/functional.py", line 589, in _run_internal_graph
outputs = node.layer(*args, **kwargs)
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1096, in __call__
outputs = call_fn(inputs, *args, **kwargs)
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
return fn(*args, **kwargs)
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/layers/convolutional.py", line 3043, in call
return backend.resize_volumes(
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/backend.py", line 3461, in resize_volumes
output = repeat_elements(output, width_factor, axis=3)
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/backend.py", line 3501, in repeat_elements
return concatenate(x_rep, axis)
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/backend.py", line 3313, in concatenate
return tf.concat([to_dense(x) for x in tensors], axis)
Node: 'model_3/unet_up_8/concat_2'
OOM when allocating tensor with shape[1,48,192,288,192] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model_3/unet_up_8/concat_2}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[Op:__inference_predict_function_5912]
resuming program execution
segmentation saved in: /scratch/qsiprep_wf/single_subject_0158_wf/anat_preproc_wf/synthseg_anat_wf/synthseg/sub-0158_T1w_lps_trans_aseg.nii.gz
posteriors saved in: /scratch/qsiprep_wf/single_subject_0158_wf/anat_preproc_wf/synthseg_anat_wf/synthseg/sub-0158_T1w_lps_trans_post.nii.gz
QC scores saved in: /scratch/qsiprep_wf/single_subject_0158_wf/anat_preproc_wf/synthseg_anat_wf/synthseg/sub-0158_T1w_lps_trans_qc.csv
If you use this tool in a publication, please cite:
SynthSeg: domain randomisation for segmentation of brain MRI scans of any contrast and resolution
B. Billot, D.N. Greeve, O. Puonti, A. Thielscher, K. Van Leemput, B. Fischl, A.V. Dalca, J.E. Iglesias
ERROR: some problems occured for the following inputs (see corresponding errors above):
/scratch/qsiprep_wf/single_subject_0158_wf/anat_preproc_wf/pad_anat_reference_wf/resample_skulled_to_reference/sub-0158_T1w_lps_trans.nii.gz
Stderr:
DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
Traceback:
Traceback (most recent call last):
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/interfaces/base/core.py", line 453, in aggregate_outputs
setattr(outputs, key, val)
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/interfaces/base/traits_extension.py", line 330, in validate
value = super(File, self).validate(objekt, name, value, return_pathlike=True)
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/interfaces/base/traits_extension.py", line 135, in validate
self.error(objekt, name, str(value))
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/traits/base_trait_handler.py", line 74, in error
raise TraitError(
traits.trait_errors.TraitError: The 'out_seg' trait of a _SynthSegOutputSpec instance must be a pathlike object or string representing an existing file, but a value of '/scratch/qsiprep_wf/single_subject_0158_wf/anat_preproc_wf/synthseg_anat_wf/synthseg/sub-0158_T1w_lps_trans_aseg.nii.gz' <class 'str'> was specified.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/interfaces/base/core.py", line 400, in run
outputs = self.aggregate_outputs(runtime)
File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/interfaces/base/core.py", line 460, in aggregate_outputs
raise FileNotFoundError(msg)
FileNotFoundError: No such file or directory '/scratch/qsiprep_wf/single_subject_0158_wf/anat_preproc_wf/synthseg_anat_wf/synthseg/sub-0158_T1w_lps_trans_aseg.nii.gz' for output 'out_seg' of a SynthSeg interface
do you know how much memory your gpu has? I get this when my gpu runs out of memory
I'm using NVIDIA RTX A2000 gpus that have 6gb of memory. I don't know if that is sufficient for synthseg. Would it be possible to set synthseg to run on cpu and eddy to run on gpu?
oh yea, that will definitely not be enough for synthseg. In the current unstable version the cpu version is requested by default. Could you see if it works with pennbbl/qsiprep:unstable?
Yes, it works! I no longer get the synthseg error.
The html report is not being generated though (and seems to be stuck after [Node] Finished "interactive_report", elapsed time 27.397992s.
). It was still being generated in version 0.20.0.
Did you use a new working directory? That should help with the reports
Yeah I did, it's a clean output and working directory.
What about if you set --nthreads 1
The cut-off top of the brain should probably go into a new issue if you're sure it's not an issue with the FOV of the original data. Closing this issue since the GPU problem has been solved
I am now getting this synthseg error with v0.22.0 using --omp-nthreads 1 and --nthreads 1. (The same command ran successfully on v0.21.5.) Because of this error it also fails to output the interactive report. I would like to have preprocessed T1w data but don't necessarily need the anatomical segmentations for now – I can always run that step separately without the gpu setting.