`use_gpu` flag not working as expected
zoccoler opened this issue · comments
Describe the bug
Hi stardist developers,
Some collaborators and I were running the 3D example training notebook in this conda environment without gputools
.
It seems even if the use_gpu
flag is set to False
, the GPU gets used anyway because the output of nvidia-smi
indicates so.
There were also cases where we got OOM
errors, but after plugging in an external GPU, the errors were not raised anymore (use_gpu
still False
).
Thus, it seems gputools
is unnecessary.
Apart from that, a line in cell 10 seems to be wrong: use_gpu = False and gputools_available()
should be use_gpu = True and gputools_available()
. Of course in our case, without gputools
, use_GPU
is False
.
So it looks like, if there is a GPU available, it uses it regardless of the use_gpu
flag.
To reproduce
Run the 3D example training notebook in this conda environment, which does not have gputools
and assess GPU usage.
Expected behavior
The GPU should only be used if use_gpu
is True
.
Also, if gputools
is not needed, it should not be used in the notebook.
Data and screenshots
None
Environment (please complete the following information):
- StarDist version [0.8.3]
- CSBDeep version [0.7.3]
- TensorFlow version [2.10.0]
- OS: [Windows and Linux]
- GPU memory (if applicable): [4 GB Windows, ~40 GB Linux]
tf.config.list_physical_devices("GPU")
returns True
Hi @zoccoler,
It seems even if the
use_gpu
flag is set toFalse
, the GPU gets used anyway because the output ofnvidia-smi
indicates so.
In the Jupyter notebook, the help message regarding the use_gpu
flag says "Indicate that the data generator should use OpenCL to do computations on the GPU." and there is also a comment right above it in the code:
# Use OpenCL-based computations for data generator during training (requires 'gputools')
use_gpu = False and gputools_available()
Hence, this flag is only about using the GPU (via OpenCL) for StarDist's data generator. It has no effect on whether TensorFlow is using the GPU or not (this is a separate issue, cf. "CUDA_VISIBLE_DEVICES").
There were also cases where we got
OOM
errors, but after plugging in an external GPU, the errors were not raised anymore (use_gpu
stillFalse
).
This is all related to TensorFlow's use of the GPU.
Apart from that, a line in cell 10 seems to be wrong:
use_gpu = False and gputools_available()
should beuse_gpu = True and gputools_available()
.
No, this is intentional. The and gputools_available()
part acts as a "guard" to always disable this flag when gputools
is not installed.
So it looks like, if there is a GPU available, it uses it regardless of the
use_gpu
flag.
If TensorFlow is configured to use the GPU (as mentioned above).
Hi @uschmidt83 ,
Thanks for clarifying all that!
Indeed I did not notice this flag was exclusively for the data generation for training step. Just the fact that there was a variable use_gpu
set to False
led me to wrong conclusions. Perhaps a more explicit naming can help to avoid confusions like this in the future, something like use_gpu_for_data_gen
.
No, this is intentional. The and gputools_available() part acts as a "guard" to always disable this flag when gputools is not installed
I see, so it is up to the user to actively change it if they want to use it for data generation.
Thanks, I believe things are clear for me now.
Perhaps a more explicit naming can help to avoid confusions like this in the future, something like
use_gpu_for_data_gen
.
In hindsight, that would've been better. Maybe we'll do that in a future release, but it's not trivial since we want to be backwards-compatible with the existing config format of already-trained models.
I see, so it is up to the user to actively change it if they want to use it for data generation.
Yes, it's kind of an advanced option that's not necessary, but that we personally use quite often.