can't fit with ddp_notebook on a Vertex AI Workbench instance (CUDA initialized)

Question

can't fit with ddp_notebook on a Vertex AI Workbench instance (CUDA initialized)

jasonbrancazio opened this issue a month ago · comments

Bug description

Using this minimal code example:

import torch
import lightning as L

print(torch.cuda.is_initialized())
trainer = L.Trainer(
    accelerator="auto", 
    strategy="ddp_notebook",
    devices="auto", 
    max_epochs=1, 
    # callbacks=callbacks,
    log_every_n_steps=1
)
print(torch.cuda.is_initialized())

On Google Colab with a T4 attached, both print statements print "False" as expected.

On a Vertex AI Workbench instance with a T4 attached, the second statement prints "True"; merely instantiating the Trainer initializes cuda. This prevents fitting with DDP.

What could be causing this, and is there any way to work around it?

What version are you seeing the problem on?

v2.2

How to reproduce the bug

No response

Error messages and logs

# Error messages and logs here please

Environment

Current environment

#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 2.0):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):

More info

No response