can't fit with ddp_notebook on a Vertex AI Workbench instance (CUDA initialized)
jasonbrancazio opened this issue · comments
Jason Brancazio commented
Bug description
Using this minimal code example:
import torch
import lightning as L
print(torch.cuda.is_initialized())
trainer = L.Trainer(
accelerator="auto",
strategy="ddp_notebook",
devices="auto",
max_epochs=1,
# callbacks=callbacks,
log_every_n_steps=1
)
print(torch.cuda.is_initialized())
On Google Colab with a T4 attached, both print statements print "False" as expected.
On a Vertex AI Workbench instance with a T4 attached, the second statement prints "True"; merely instantiating the Trainer initializes cuda. This prevents fitting with DDP.
What could be causing this, and is there any way to work around it?
What version are you seeing the problem on?
v2.2
How to reproduce the bug
No response
Error messages and logs
# Error messages and logs here please
Environment
Current environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 2.0):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):
More info
No response