stardist / stardist

StarDist - Object Detection with Star-convex Shapes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`use_gpu` flag not working as expected

zoccoler opened this issue · comments

Describe the bug
Hi stardist developers,

Some collaborators and I were running the 3D example training notebook in this conda environment without gputools.
It seems even if the use_gpu flag is set to False, the GPU gets used anyway because the output of nvidia-smi indicates so.
There were also cases where we got OOM errors, but after plugging in an external GPU, the errors were not raised anymore (use_gpu still False).
Thus, it seems gputools is unnecessary.

Apart from that, a line in cell 10 seems to be wrong: use_gpu = False and gputools_available() should be use_gpu = True and gputools_available(). Of course in our case, without gputools, use_GPU is False.

So it looks like, if there is a GPU available, it uses it regardless of the use_gpu flag.

To reproduce
Run the 3D example training notebook in this conda environment, which does not have gputools and assess GPU usage.

Expected behavior
The GPU should only be used if use_gpu is True.
Also, if gputools is not needed, it should not be used in the notebook.

Data and screenshots
None

Environment (please complete the following information):

  • StarDist version [0.8.3]
  • CSBDeep version [0.7.3]
  • TensorFlow version [2.10.0]
  • OS: [Windows and Linux]
  • GPU memory (if applicable): [4 GB Windows, ~40 GB Linux]

tf.config.list_physical_devices("GPU") returns True

Hi @zoccoler,

It seems even if the use_gpu flag is set to False, the GPU gets used anyway because the output of nvidia-smi indicates so.

In the Jupyter notebook, the help message regarding the use_gpu flag says "Indicate that the data generator should use OpenCL to do computations on the GPU." and there is also a comment right above it in the code:

# Use OpenCL-based computations for data generator during training (requires 'gputools')
use_gpu = False and gputools_available()

Hence, this flag is only about using the GPU (via OpenCL) for StarDist's data generator. It has no effect on whether TensorFlow is using the GPU or not (this is a separate issue, cf. "CUDA_VISIBLE_DEVICES").

There were also cases where we got OOM errors, but after plugging in an external GPU, the errors were not raised anymore (use_gpu still False).

This is all related to TensorFlow's use of the GPU.

Apart from that, a line in cell 10 seems to be wrong: use_gpu = False and gputools_available() should be use_gpu = True and gputools_available().

No, this is intentional. The and gputools_available() part acts as a "guard" to always disable this flag when gputools is not installed.

So it looks like, if there is a GPU available, it uses it regardless of the use_gpu flag.

If TensorFlow is configured to use the GPU (as mentioned above).

Hi @uschmidt83 ,

Thanks for clarifying all that!
Indeed I did not notice this flag was exclusively for the data generation for training step. Just the fact that there was a variable use_gpu set to False led me to wrong conclusions. Perhaps a more explicit naming can help to avoid confusions like this in the future, something like use_gpu_for_data_gen.

No, this is intentional. The and gputools_available() part acts as a "guard" to always disable this flag when gputools is not installed

I see, so it is up to the user to actively change it if they want to use it for data generation.

Thanks, I believe things are clear for me now.

Perhaps a more explicit naming can help to avoid confusions like this in the future, something like use_gpu_for_data_gen.

In hindsight, that would've been better. Maybe we'll do that in a future release, but it's not trivial since we want to be backwards-compatible with the existing config format of already-trained models.

I see, so it is up to the user to actively change it if they want to use it for data generation.

Yes, it's kind of an advanced option that's not necessary, but that we personally use quite often.