PennLINC / qsiprep

Preprocessing of diffusion MRI

Home Page:http://qsiprep.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

synthseg error when using GPU

jhauneuro opened this issue · comments

Summary

When running eddy with CUDA I get an error in synthseg. The interactive html report was also not created.

Additional details

  • QSIPrep version: 0.21.4
  • Docker version: 25.0.4

What were you trying to do?

Preprocess data with qsiprep using eddy with CUDA.

What did you expect to happen?

Successfully preprocess the data and all outputs.

What actually happened?

The dwi preprocessing was successful but I got an error in synthseg for the anatomical data. The subject's html report was also missing.

Reproducing the bug

Command

docker run -ti --rm --gpus all \
	-v /usr/local/freesurfer/7.4.1/license.txt:/opt/freesurfer/license.txt:ro \
	-v $bids_dir:/data:ro \
	-v $eddy_config_file:/sngl/eddy/eddy_config.json:ro \
	-v $out_dir:/out \
	-v $out_dir:/scratch \
	qsiprep:latest /data /out participant \
	--eddy-config /sngl/eddy/eddy_config.json \
	--output-resolution $res --pepolar-method TOPUP \
	--bids-database-dir $bids_dir --omp-nthreads 1 \
	-w /scratch

Crash log

Node: qsiprep_wf.single_subject_0158_wf.anat_preproc_wf.synthseg_anat_wf.synthseg
Working directory: /scratch/qsiprep_wf/single_subject_0158_wf/anat_preproc_wf/synthseg_anat_wf/synthseg

Node inputs:

args = <undefined>
environ = {'OMP_NUM_THREADS': '1'}
fast = False
input_image = <undefined>
num_threads = 1
out_post = <undefined>
out_qc = <undefined>
out_seg = <undefined>
robust = <undefined>
subjects_dir = <undefined>

Traceback (most recent call last):
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/pipeline/plugins/multiproc.py", line 67, in run_node
    result["result"] = node.run(updatehash=updatehash)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 527, in run
    result = self._run_interface(execute=True)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 645, in _run_interface
    return self._run_command(execute)
  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/pipeline/engine/nodes.py", line 771, in _run_command
    raise NodeExecutionError(msg)
nipype.pipeline.engine.nodes.NodeExecutionError: Exception raised while executing Node synthseg.

Cmdline:
	mri_synthseg --i /scratch/qsiprep_wf/single_subject_0158_wf/anat_preproc_wf/pad_anat_reference_wf/resample_skulled_to_reference/sub-0158_T1w_lps_trans.nii.gz --threads 1 --post sub-0158_T1w_lps_trans_post.nii.gz --qc sub-0158_T1w_lps_trans_qc.csv --o sub-0158_T1w_lps_trans_aseg.nii.gz
Stdout:
	SynthSeg 2.0
	using 1 thread
	predicting 1/1

	the following problem occured with image /scratch/qsiprep_wf/single_subject_0158_wf/anat_preproc_wf/pad_anat_reference_wf/resample_skulled_to_reference/sub-0158_T1w_lps_trans.nii.gz :
	Traceback (most recent call last):
	  File "/opt/freesurfer/bin/mri_synthseg", line 273, in predict
	    post_patch_segmentation, qc_score = net.predict([image, shape_input])
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
	    raise e.with_traceback(filtered_tb) from None
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
	    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
	tensorflow.python.framework.errors_impl.ResourceExhaustedError: Graph execution error:

	Detected at node 'model_3/unet_up_8/concat_2' defined at (most recent call last):
	    File "/opt/freesurfer/bin/mri_synthseg", line 2581, in <module>
	      main()
	    File "/opt/freesurfer/bin/mri_synthseg", line 123, in main
	      predict(
	    File "/opt/freesurfer/bin/mri_synthseg", line 273, in predict
	      post_patch_segmentation, qc_score = net.predict([image, shape_input])
	    File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
	      return fn(*args, **kwargs)
	    File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/engine/training.py", line 1982, in predict
	      tmp_batch_outputs = self.predict_function(iterator)
	    File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/engine/training.py", line 1801, in predict_function
	      return step_function(self, iterator)
	    File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/engine/training.py", line 1790, in step_function
	      outputs = model.distribute_strategy.run(run_step, args=(data,))
	    File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/engine/training.py", line 1783, in run_step
	      outputs = model.predict_step(data)
	    File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/engine/training.py", line 1751, in predict_step
	      return self(x, training=False)
	    File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
	      return fn(*args, **kwargs)
	    File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1096, in __call__
	      outputs = call_fn(inputs, *args, **kwargs)
	    File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
	      return fn(*args, **kwargs)
	    File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/engine/functional.py", line 451, in call
	      return self._run_internal_graph(
	    File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/engine/functional.py", line 589, in _run_internal_graph
	      outputs = node.layer(*args, **kwargs)
	    File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
	      return fn(*args, **kwargs)
	    File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/engine/base_layer.py", line 1096, in __call__
	      outputs = call_fn(inputs, *args, **kwargs)
	    File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
	      return fn(*args, **kwargs)
	    File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/layers/convolutional.py", line 3043, in call
	      return backend.resize_volumes(
	    File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/backend.py", line 3461, in resize_volumes
	      output = repeat_elements(output, width_factor, axis=3)
	    File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/backend.py", line 3501, in repeat_elements
	      return concatenate(x_rep, axis)
	    File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/keras/backend.py", line 3313, in concatenate
	      return tf.concat([to_dense(x) for x in tensors], axis)
	Node: 'model_3/unet_up_8/concat_2'
	OOM when allocating tensor with shape[1,48,192,288,192] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
		 [[{{node model_3/unet_up_8/concat_2}}]]
	Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
	 [Op:__inference_predict_function_5912]

	resuming program execution


	segmentation  saved in:    /scratch/qsiprep_wf/single_subject_0158_wf/anat_preproc_wf/synthseg_anat_wf/synthseg/sub-0158_T1w_lps_trans_aseg.nii.gz
	posteriors saved in:       /scratch/qsiprep_wf/single_subject_0158_wf/anat_preproc_wf/synthseg_anat_wf/synthseg/sub-0158_T1w_lps_trans_post.nii.gz
	QC scores saved in:        /scratch/qsiprep_wf/single_subject_0158_wf/anat_preproc_wf/synthseg_anat_wf/synthseg/sub-0158_T1w_lps_trans_qc.csv

	If you use this tool in a publication, please cite:
	SynthSeg: domain randomisation for segmentation of brain MRI scans of any contrast and resolution
	B. Billot, D.N. Greeve, O. Puonti, A. Thielscher, K. Van Leemput, B. Fischl, A.V. Dalca, J.E. Iglesias

	ERROR: some problems occured for the following inputs (see corresponding errors above):
	/scratch/qsiprep_wf/single_subject_0158_wf/anat_preproc_wf/pad_anat_reference_wf/resample_skulled_to_reference/sub-0158_T1w_lps_trans.nii.gz
Stderr:
	DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
	Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
	DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
	Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
Traceback:
	Traceback (most recent call last):
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/interfaces/base/core.py", line 453, in aggregate_outputs
	    setattr(outputs, key, val)
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/interfaces/base/traits_extension.py", line 330, in validate
	    value = super(File, self).validate(objekt, name, value, return_pathlike=True)
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/interfaces/base/traits_extension.py", line 135, in validate
	    self.error(objekt, name, str(value))
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/traits/base_trait_handler.py", line 74, in error
	    raise TraitError(
	traits.trait_errors.TraitError: The 'out_seg' trait of a _SynthSegOutputSpec instance must be a pathlike object or string representing an existing file, but a value of '/scratch/qsiprep_wf/single_subject_0158_wf/anat_preproc_wf/synthseg_anat_wf/synthseg/sub-0158_T1w_lps_trans_aseg.nii.gz' <class 'str'> was specified.

	During handling of the above exception, another exception occurred:

	Traceback (most recent call last):
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/interfaces/base/core.py", line 400, in run
	    outputs = self.aggregate_outputs(runtime)
	  File "/opt/conda/envs/qsiprep/lib/python3.10/site-packages/nipype/interfaces/base/core.py", line 460, in aggregate_outputs
	    raise FileNotFoundError(msg)
	FileNotFoundError: No such file or directory '/scratch/qsiprep_wf/single_subject_0158_wf/anat_preproc_wf/synthseg_anat_wf/synthseg/sub-0158_T1w_lps_trans_aseg.nii.gz' for output 'out_seg' of a SynthSeg interface

do you know how much memory your gpu has? I get this when my gpu runs out of memory

I'm using NVIDIA RTX A2000 gpus that have 6gb of memory. I don't know if that is sufficient for synthseg. Would it be possible to set synthseg to run on cpu and eddy to run on gpu?

oh yea, that will definitely not be enough for synthseg. In the current unstable version the cpu version is requested by default. Could you see if it works with pennbbl/qsiprep:unstable?

Yes, it works! I no longer get the synthseg error.

The html report is not being generated though (and seems to be stuck after [Node] Finished "interactive_report", elapsed time 27.397992s.). It was still being generated in version 0.20.0.

Did you use a new working directory? That should help with the reports

Yeah I did, it's a clean output and working directory.

What about if you set --nthreads 1

It takes much longer, but it successfully outputs the html reports!

However, the top of the brain is cut off (see screenshots below). This wasn't the case for qsiprep 0.19.1. Is there a way to prevent or configure this so we don't lose brain coverage?

sdc_b0ref_qsiprep_0 21 4_before

sdc_b0ref_qsiprep_0 21 4_after

The cut-off top of the brain should probably go into a new issue if you're sure it's not an issue with the FOV of the original data. Closing this issue since the GPU problem has been solved

I am now getting this synthseg error with v0.22.0 using --omp-nthreads 1 and --nthreads 1. (The same command ran successfully on v0.21.5.) Because of this error it also fails to output the interactive report. I would like to have preprocessed T1w data but don't necessarily need the anatomical segmentations for now – I can always run that step separately without the gpu setting.