Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.

Home Page:https://lightning.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

can't fit with ddp_notebook on a Vertex AI Workbench instance (CUDA initialized)

jasonbrancazio opened this issue · comments

Bug description

Using this minimal code example:

import torch
import lightning as L

print(torch.cuda.is_initialized())
trainer = L.Trainer(
    accelerator="auto", 
    strategy="ddp_notebook",
    devices="auto", 
    max_epochs=1, 
    # callbacks=callbacks,
    log_every_n_steps=1
)
print(torch.cuda.is_initialized())

On Google Colab with a T4 attached, both print statements print "False" as expected.

On a Vertex AI Workbench instance with a T4 attached, the second statement prints "True"; merely instantiating the Trainer initializes cuda. This prevents fitting with DDP.

What could be causing this, and is there any way to work around it?

What version are you seeing the problem on?

v2.2

How to reproduce the bug

No response

Error messages and logs

# Error messages and logs here please

Environment

Current environment
#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 2.0):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):

More info

No response